arxiv: 2605.11738 · v1 · submitted 2026-05-12 · 💻 cs.AI

Recognition: no theorem link

OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling

Zhong Li , Zihan Guo , Xiaohan Lu , Juntao Wang , Jie Song , Chao Shen , Jiageng Wu , Mingyang Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:31 UTC · model grok-4.3

classification 💻 cs.AI

keywords optimizationartifactsdetectionhallucinationmodelingoptargusauditingclean

0 comments

The pith

OptArgus deploys a conductor-plus-specialist multi-agent architecture and a new four-category hallucination taxonomy to detect structural errors in LLM-generated optimization models more reliably than single-agent baselines on a three-part benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models are now asked to read a business or engineering problem in plain words and output both the math equations and the code that a solver can run. The trouble is that the final number the solver returns can look correct even when the underlying setup is wrong in its variables, constraints, or objective. The paper treats this as a distinct kind of hallucination and builds a taxonomy that splits failures into objective, variable, constraint, and implementation categories. It then creates OptArgus, a team of AI agents: one conductor decides which specialist auditor to call, the specialists check their assigned part against the original description, and a consolidation step ranks the evidence. The system is tested on hundreds of clean examples, over a thousand examples with single injected errors, and thousands of real outputs from LLMs. In these tests it triggers fewer false alarms on good models and finds the injected errors more accurately than a single-agent version of the same idea.

Core claim

Against a matched single-agent baseline, OptArgus produces fewer false alarms on clean artifacts, more accurate top-ranked localization on controlled single-error cases, and stronger detection on natural model outputs.

Load-bearing premise

That the four-category hallucination taxonomy (objective, variable, constraint, implementation) comprehensively captures the structural inconsistencies that matter for optimization modeling correctness.

Figures

Figures reproduced from arXiv: 2605.11738 by Chao Shen, Jiageng Wu, Jie Song, Juntao Wang, Mingyang Sun, Xiaohan Lu, Zhong Li, Zihan Guo.

**Figure 2.** Figure 2: Compact fishbone view of the OR-expert-built optimization-modeling hallucination taxon [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: OptArgus workflow. A conductor routes the audit tuple to four specialized experts— objective, variable, constraint, and implementation—which audit the artifact from their respective perspectives and exchange optional cross-agent critiques over shared dependencies. Their structured findings are then consolidated and reranked before a visualization agent emits the final auditable report. A more detailed syst… view at source ↗

**Figure 4.** Figure 4: Detailed system diagram of OptArgus. The [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly used to translate natural-language optimization problems into mathematical formulations and solver code, but matching the reference objective value is not a reliable test of correctness: an artifact may agree numerically while still changing the underlying optimization semantics. We formulate this issue as \emph{optimization-modeling hallucination detection}, namely structural consistency auditing over the problem description, symbolic model, and solver implementation. We develop, to our knowledge, the first fine-grained hallucination taxonomy specifically for optimization modeling, spanning objective, variable, constraint, and implementation failures. We use this taxonomy to design OptArgus, a multi-agent detector with conductor routing, specialist auditors, and evidence consolidation. To evaluate this setting, we introduce a three-part benchmark suite with $484$ clean artifacts, $1266$ controlled injected artifacts, and $6292$ natural LLM-generated artifacts. Against a matched single-agent baseline, OptArgus produces fewer false alarms on clean artifacts, more accurate top-ranked localization on controlled single-error cases, and stronger detection on natural model outputs. Together, these contributions turn optimization-modeling hallucination detection into a concrete empirical problem and suggest that modular, taxonomy-grounded auditing is a practical route to more reliable optimization modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OptArgus introduces a usable taxonomy and multi-agent routing for catching structural errors in LLM optimization models, with measurable gains over single-agent baselines on its benchmarks.

read the letter

The paper's main advance is a four-part taxonomy of optimization hallucinations—objective, variable, constraint, and implementation—paired with OptArgus, a conductor that routes an artifact to the matching specialist auditor before consolidating evidence. They test this on three sets: 484 clean artifacts, 1266 with controlled single errors injected from the taxonomy, and 6292 natural LLM outputs. Against a matched single-agent baseline, it reduces false alarms on clean cases, improves top-ranked localization on the injected ones, and raises detection rates on the natural set. That three-way benchmark is a practical move because solver output matching alone does not catch semantic drift in the model itself. The architecture is straightforward and the comparisons are direct, so the empirical edge looks real on the tasks they defined. The main limitation is that the injected errors and the taxonomy are tightly coupled. If natural hallucinations include issues outside those four buckets—such as bad scaling, solver tolerance mismatches, or duality violations—the specialist auditors will not catch them, and the reported gains on natural artifacts rest on an evaluation without separate ground-truth labeling. The abstract does not show whether the authors checked for such gaps or ran any post-hoc analysis on missed cases. This work is aimed at people who build or deploy LLM tools for operations research and engineering modeling. Readers who need concrete ways to audit generated models will find the taxonomy and the routing pattern worth trying. It deserves peer review because the problem is timely, the benchmark is new, and the modular approach can be extended or stress-tested by others. I would send it to referees and ask them to check coverage of error types beyond the taxonomy and how the natural set was validated.

Referee Report

2 major / 2 minor

Summary. The paper introduces a four-category taxonomy (objective, variable, constraint, implementation) for hallucinations in LLM-generated optimization models, proposes OptArgus as a multi-agent detector with conductor routing and specialist auditors, and presents a three-part benchmark (484 clean artifacts, 1266 controlled injected artifacts, 6292 natural LLM-generated artifacts) on which OptArgus outperforms a matched single-agent baseline in false-alarm rate, top-ranked localization, and detection strength.

Significance. If the central claims hold, the work converts optimization-modeling hallucination detection into a concrete empirical task and shows that taxonomy-grounded modular auditing can reduce false positives and improve localization relative to single-agent baselines. The introduced benchmark suite would be a reusable resource for the community.

major comments (2)

[Evaluation] Evaluation section: the 1266 controlled injected artifacts are constructed by applying errors drawn exclusively from the four-category taxonomy, so reported gains in localization accuracy on single-error cases and stronger detection on natural outputs are measured against a taxonomy-defined benchmark; this does not test whether other structural inconsistencies (e.g., incorrect solver tolerances, duality violations, or scaling-induced instability) exist and would be missed by the specialist auditors.
[Benchmark] Benchmark construction for the 6292 natural artifacts: detection strength is reported without an independent ground-truth process that could reveal missed hallucination categories, leaving open the possibility that the multi-agent advantage is inflated by systematic under-detection outside the taxonomy.

minor comments (2)

[Abstract] The abstract states 'to our knowledge, the first fine-grained taxonomy'; a brief related-work paragraph should explicitly compare against any prior taxonomies in code generation or mathematical reasoning to support this claim.
[Methods] Methods description of the conductor routing and evidence consolidation steps would benefit from a small diagram or pseudocode to clarify the information flow between agents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thorough review and constructive feedback. We address each major comment below, acknowledging valid points about benchmark scope while clarifying the design rationale and planned revisions.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the 1266 controlled injected artifacts are constructed by applying errors drawn exclusively from the four-category taxonomy, so reported gains in localization accuracy on single-error cases and stronger detection on natural outputs are measured against a taxonomy-defined benchmark; this does not test whether other structural inconsistencies (e.g., incorrect solver tolerances, duality violations, or scaling-induced instability) exist and would be missed by the specialist auditors.

Authors: We agree that the controlled artifacts are generated exclusively from the four-category taxonomy by design, which enables controlled measurement of detection and localization performance within those categories. Issues such as solver tolerances, duality violations, or scaling instability are typically downstream effects or implementation details rather than primary structural hallucinations in the modeling phase. In the revision we will add an explicit discussion of the taxonomy scope, note these as candidate extensions for future work, and clarify that the evaluation targets modeling-level structural consistency as defined in the paper. revision: partial
Referee: [Benchmark] Benchmark construction for the 6292 natural artifacts: detection strength is reported without an independent ground-truth process that could reveal missed hallucination categories, leaving open the possibility that the multi-agent advantage is inflated by systematic under-detection outside the taxonomy.

Authors: The natural artifacts are assessed relative to the taxonomy because constructing fully independent ground truth across all conceivable error types would require large-scale expert annotation outside the current study scope. The reported gains are comparative (OptArgus vs. single-agent baseline) within the taxonomy-defined setting. We will revise the manuscript to include a dedicated limitations paragraph that explicitly states this boundary and notes that undetected categories outside the taxonomy remain possible. revision: partial

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the assumption that structural consistency auditing is a sufficient proxy for correctness and that the new taxonomy covers the relevant failure modes; no free parameters or external constants are mentioned.

axioms (1)

domain assumption Structural consistency between problem description, symbolic model, and solver implementation is a valid proxy for detecting optimization-modeling hallucinations
Invoked to justify the auditing approach and taxonomy design.

invented entities (2)

OptArgus multi-agent detector with conductor routing and specialist auditors no independent evidence
purpose: To perform fine-grained hallucination detection
New system architecture introduced in the paper
Four-category hallucination taxonomy (objective, variable, constraint, implementation) no independent evidence
purpose: To classify structural failures in optimization models
New taxonomy developed for this work

pith-pipeline@v0.9.0 · 5540 in / 1354 out tokens · 73055 ms · 2026-05-13T06:31:38.420413+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

96 extracted references · 96 canonical work pages

[1]

Extract a faithful semantic schema from the natural-language problem specification

work page
[2]

Separate hard requirements from soft preferences

work page
[3]

Summarize the objective intent, entities, parameters, index sets, and implementation-sensitive details.,→

work page
[4]

Decide which specialist modules should actually be invoked for this case

work page
[5]

Important rules: - Do not invent missing business rules

Produce routing guidance for the selected specialists. Important rules: - Do not invent missing business rules. - If the problem statement is underspecified, record an explicit assumption. - Stay neutral: you are not yet deciding whether the model is correct, only preparing the audit. - Keep the output faithful to the text because downstream experts will ...

work page
[6]

Focus only on objective-function issues

work page
[7]

Compare the natural-language objective intent against the symbolic model

work page
[8]

Use the most specific hallucination type justified by the evidence

work page
[9]

If the root cause is really a variable, constraint, or implementation issue, record it as an unresolved dependency rather than mislabeling it as an objective error.,→

work page
[10]

Quote short evidence spans from the problem and symbolic model whenever possible

work page
[11]

Use taxonomy-grounded examples when explaining why a pattern is suspicious

work page
[12]

Do not emit downstream consequences of the same objective mistake as separate findings.,→

Emit atomic root-cause findings only. Do not emit downstream consequences of the same objective mistake as separate findings.,→

work page
[13]

Those belong to the implementation expert.,→

Do not use solver-code-only issues to justify an objective finding in this module. Those belong to the implementation expert.,→

work page
[14]

If the problem text states a clear objective sense such as`minimize cost`or`maximize profit`, and the symbolic model reverses that sense, prefer`Wrong Optimization Direction` even if other missing constraints are also present. ,→ ,→

work page
[15]

A code path that merely compensates for a symbolic objective reversal should not suppress the primary objective diagnosis

If the problem text and symbolic model disagree on objective sense, prefer an objective finding even when the solver code happens to align with the problem text. A code path that merely compensates for a symbolic objective reversal should not suppress the primary objective diagnosis. ,→ ,→ ,→ 19

work page
[16]

If the objective omits or misbinds one entity/index while the rest of the formulation is structurally similar, prefer`Wrong Index Range`,`Misaligned Time Indices`, or`Coefficient Misbinding`over a generic omission label. ,→ ,→

work page
[17]

Unusual objectives are still valid if the text literally states them

Do not replace an explicitly stated`minimize`/`minimum`/`least`objective with a business-intuition`maximize`interpretation unless the text itself clearly contradicts the symbolic sense. Unusual objectives are still valid if the text literally states them. ,→ ,→ Output instructions: - Return at most {max_findings} findings, but prefer 0-3 unique root-cause...

work page
[18]

Focus on whether the model created the right decision objects

work page
[19]

Check dimensions, domains, sign restrictions, index sets, and variable-role couplings

work page
[20]

Compare how variables are defined and how they are used in the symbolic objective and symbolic constraints.,→

work page
[21]

If a suspicious issue is really caused by objective logic, missing constraints, or code-only divergence, record that as an unresolved dependency rather than forcing a variable label.,→

work page
[22]

Use the taxonomy examples to explain why a pattern changes the meaning of the optimization problem.,→

work page
[23]

Do not split one variable-design mistake into several paraphrases.,→

Emit atomic root-cause findings only. Do not split one variable-design mistake into several paraphrases.,→

work page
[24]

Those belong to the implementation expert.,→

Do not use solver-code-only issues to justify a variable finding in this module. Those belong to the implementation expert.,→

work page
[25]

,→ ,→ ,→

Use`Relaxing a Discrete Variable into a Continuous Variable`only when the problem text or symbolic specification clearly requires whole numbers, binary decisions, counts, yes/no choices, trips, facilities, assignments, or other discrete decisions, but the symbolic model uses a continuous domain. ,→ ,→ ,→

work page
[26]

Those are usually constraint or objective issues

Do not relabel wrong coefficients, wrong units, wrong percentages, or missing policy/share constraints as variable errors just because variables appear in those expressions. Those are usually constraint or objective issues. ,→ ,→

work page
[27]

,→ ,→ 20

If the decision objects are correct and the only issue is how they enter a percentage/share/default-rule constraint, abstain from a variable label and leave that to the constraint expert. ,→ ,→ 20

work page
[28]

Treat mixed domains inside one sibling decision family as strong evidence for a variable-domain hallucination. If one`trip`,`ride`,`shipment`,`facility`, or `units-to-produce`variable is continuous while parallel decision variables remain integer or binary, prefer a variable-domain label over a constraint label. ,→ ,→ ,→

work page
[29]

In narrative transportation or production problems, phrases such as`how many`,`number of trips`,`quantity to produce`, or`must be a whole number`should strongly increase confidence that the decision is discrete. ,→ ,→

work page
[30]

Prefer`Relaxing a Discrete Variable into a Continuous Variable`whenever the text clearly implies counts, trips, assignments, facilities, or whole-number quantities

If the decision objects themselves are otherwise correct, do not use`Wrong Variable Object` as a fallback when the real issue is a domain relaxation. Prefer`Relaxing a Discrete Variable into a Continuous Variable`whenever the text clearly implies counts, trips, assignments, facilities, or whole-number quantities. ,→ ,→ ,→

work page
[31]

Do not infer integrality from`how many`alone when the surrounding problem is a standard LP, blending, mixture, resource-allocation, or continuous-quantity formulation.,→

work page
[32]

If the problem explicitly asks for an LP or otherwise frames the decision as continuous amounts, shares, mixtures, or allocations, abstain from a discrete-relaxation label unless there is a separate explicit integer / binary requirement. ,→ ,→

work page
[33]

If your only evidence is a generic counting phrase but there are no strong indivisibility cues such as trips, facilities, yes/no selection, assignments, or explicit whole-number requirements, prefer abstention over a speculative domain hallucination. ,→ ,→

work page
[34]

If the scalar variables faithfully cover the intended decision objects and there is no semantic loss, abstain

Do not flag`Omitted Index Set`merely because a small fixed family of scalar variables is used instead of a cleaner indexed notation. If the scalar variables faithfully cover the intended decision objects and there is no semantic loss, abstain. ,→ ,→

work page
[35]

Do not flag a domain/sign/value-range error when the candidate explanation itself says the current integer/binary domain is already correct, or when the only complaint is that a bound is expressed in variable declarations rather than as a separate explicit constraint. ,→ ,→

work page
[36]

If the symbolic model introduces a shadow/alias variable that represents the same business decision as an existing variable, prefer`Duplicate Variable Roles`even when downstream constraints or the objective also become miscounted. ,→ ,→

work page
[37]

,→ ,→ Output instructions: - Return at most {max_findings} findings, but prefer 0-3 unique root-cause findings

When a duplicated decision object later causes double counting in a cardinality/budget expression, treat the duplicate variable as the primary root cause and leave the aggregation consequence to dependencies unless there is an independently wrong pooled rule. ,→ ,→ Output instructions: - Return at most {max_findings} findings, but prefer 0-3 unique root-c...

work page
[38]

Focus on semantic translation, missing skeletons, implicit rules, logic chains, aggregation/indexing, boundaries, strength, and scheduling-specific structure.,→

work page
[39]

Compare the textual rules in the problem with the symbolic constraints

work page
[40]

Look for situations where the model stays solvable but is semantically wrong

work page
[41]

Use Appendix-B-style example cues to justify the label

work page
[42]

If a problem is actually driven by wrong variables or implementation-only code divergence, record it as a dependency instead of forcing a constraint label.,→

work page
[43]

Do not report both a cause and its obvious downstream consequence as separate findings unless they are independently actionable.,→

Emit atomic root-cause findings only. Do not report both a cause and its obvious downstream consequence as separate findings unless they are independently actionable.,→

work page
[44]

Those belong to the implementation expert.,→

Do not use solver-code-only issues to justify a constraint finding in this module. Those belong to the implementation expert.,→

work page
[45]

When the missing rule is a percentage, share, quota, composition, or policy-style requirement linking otherwise correct variables, prefer a business-rule or implicit-rule label over a capacity/coverage skeleton label. ,→ ,→

work page
[46]

Reserve`Missing Capacity or Coverage Skeleton`for genuinely missing resource, demand, coverage, flow, or capacity structures.,→

work page
[47]

Reserve`Missing Default Business Rules`for missing policy constraints, ratio/share requirements, default eligibility rules, or other non-resource business logic that should still be hard constraints. ,→ ,→

work page
[48]

If the only issue is a variable domain/type error and the relevant constraint is otherwise correctly stated, leave it to the variable expert instead of forcing a constraint subtype.,→

work page
[49]

,→ ,→ ,→

Use`Wrong Aggregation Level`only when the text requires an explicit pooled or combined limit over multiple variables or entities, but the symbolic model drops that pooled total, replaces it with separate per-variable bounds, or otherwise collapses the required aggregation structure. ,→ ,→ ,→

work page
[50]

Do not use it merely because a constraint contains the word `total`

Prefer`Wrong Aggregation Level`only when the evidence really is about a single joint total such as`pooled total`,`combined total`,`joint budget`,`overall budget`, or an explicit `sum across`construction. Do not use it merely because a constraint contains the word `total`. ,→ ,→ ,→

work page
[51]

Use`Missing Capacity or Coverage Skeleton`only when the entire resource/coverage family is absent, not when the family exists but the aggregation granularity is wrong.,→

work page
[52]

If the model still has the right variables and a related resource theme but misses the single joint budget, total-hours, total-demand, or pooled-allocation equation, treat it as aggregation/index-coding rather than a generic missing skeleton. But do not relabel ordinary balance, recursion, nutrient minimum, demand minimum, or throughput constraints as agg...

work page
[53]

Distinguish carefully between the three high-frequency families: `Wrong Aggregation Level`= a pooled total such as`X+Y`,`all trips`,`combined staff`, `overall budget`, or`sum across modes`is replaced by separate bounds or omitted.,→ `Missing Capacity or Coverage Skeleton`= the model omits an actual resource, demand, flow, nutrition, coverage, or throughpu...

work page
[54]

Those are capacity/coverage-style structure unless the issue is specifically a pooled-total aggregation mistake

Do not call a nutrient minimum, demand minimum, throughput minimum, flow balance, or total shipped/produced requirement a default business rule. Those are capacity/coverage-style structure unless the issue is specifically a pooled-total aggregation mistake. ,→ ,→

work page
[55]

If another module already captures the root cause as an objective-sense reversal, a discrete-variable domain relaxation, or a code-only implementation mismatch, avoid emitting a generic constraint label that would outrank that root cause. ,→ ,→

work page
[56]

Missing`x <= U`,`x >= L`, start-of-horizon, or end-of-horizon conditions should not be labeled as a generic business rule

Use`Missing Initial or Terminal Conditions`when the omitted rule is an explicit lower bound, upper bound, initial state, terminal state, starting inventory, ending inventory, or one-sided boundary condition on an otherwise correctly defined variable or flow. Missing`x <= U`,`x >= L`, start-of-horizon, or end-of-horizon conditions should not be labeled as ...

work page
[57]

,→ ,→ ,→

If the text says`at least`,`at most`,`minimum`,`maximum`,`starts with`,`ends with`, `initial`, or`terminal`, and the symbolic model drops that one-sided or endpoint condition, prefer`Missing Initial or Terminal Conditions`over`Missing Capacity or Coverage Skeleton` or`Missing Default Business Rules`. ,→ ,→ ,→

work page
[58]

If a boundary-style constraint is partially present but attached to the wrong variable or wrong side, still prefer`Missing Initial or Terminal Conditions`when the core error is the loss of an explicit bound or endpoint condition. ,→ ,→

work page
[59]

Do not upgrade an explicit one-way conditional such as`if A, then not B`into mutual exclusion unless the text clearly states`cannot both`,`either-or`,`incompatible in either direction`, or an equivalent symmetric prohibition. ,→ ,→

work page
[60]

If the current model is a plausible literal reading and the concern is merely an alternative stricter interpretation, abstain instead of emitting a hallucination finding

Use`Wrong Interpretation of the Rule`only for a clear semantic contradiction. If the current model is a plausible literal reading and the concern is merely an alternative stricter interpretation, abstain instead of emitting a hallucination finding. ,→ ,→ 22

work page
[61]

If your own reasoning depends on phrases such as`may imply`,`could imply`,`might imply`, `potential ambiguity`, or`if that interpretation is intended`, that is usually evidence to abstain rather than to label. ,→ ,→

work page
[62]

If the real failure is that the symbolic model duplicated or shadowed a decision variable and the pooled total is only downstream double counting, do not relabel that as`Wrong Aggregation Level`; leave it to the variable expert as`Duplicate Variable Roles`. ,→ ,→

work page
[63]

If`omega <= 0.35(total)`is rewritten as an equivalent linear inequality such as`0.65*omega - 0.35*alpha <= 0`, abstain

Do not flag a percentage/share/quota rule merely because the symbolic model uses an algebraically rearranged linear form. If`omega <= 0.35(total)`is rewritten as an equivalent linear inequality such as`0.65*omega - 0.35*alpha <= 0`, abstain. ,→ ,→

work page
[64]

,→ ,→ Output instructions: - Return at most {max_findings} findings, but prefer 0-3 unique root-cause findings

If the omitted rule is a cardinality-style pooled count such as`at most 4 children`,`select at least 3 projects`, or`open no more than k facilities`, prefer`Wrong Aggregation Level` over`Missing Initial or Terminal Conditions`. ,→ ,→ Output instructions: - Return at most {max_findings} findings, but prefer 0-3 unique root-cause findings. - Prefer precise ...

work page
[65]

Compare the symbolic model against the code, not just the code against the problem text

work page
[66]

Emit an implementation finding only when the code changes the mathematical object, drops indices, uses the wrong API/domain, or otherwise breaks symbolic-to-code fidelity

Treat the symbolic model as the source of truth for this module. Emit an implementation finding only when the code changes the mathematical object, drops indices, uses the wrong API/domain, or otherwise breaks symbolic-to-code fidelity. ,→ ,→

work page
[67]

Record a dependency or note instead.,→

If the symbolic formulation itself is already wrong and the code faithfully mirrors it, do not report an implementation hallucination for that issue. Record a dependency or note instead.,→

work page
[68]

Check objective sense, variable domains, loop/index expansion, omitted constraints, solver compatibility, and post-solve reporting.,→

work page
[69]

When the symbolic model says`minimize`but the code materializes`maximize`, negates objective coefficients, flips the reported objective sign, or sets the wrong solver API sense, prefer `Wrong Objective Sense in Code`. ,→ ,→

work page
[70]

A pure code-level objective-sense reversal is still an implementation hallucination even if the objective terms themselves look numerically similar.,→ 23

work page
[71]

For example, `minimize`with direct positive coefficients and an inactive post-solve sign flip is not an objective-sense error

Do not flag`Wrong Objective Sense in Code`when the code uses a standard, internally consistent sign-handling convention that still matches the symbolic objective. For example, `minimize`with direct positive coefficients and an inactive post-solve sign flip is not an objective-sense error. ,→ ,→ ,→

work page
[72]

Likewise,`maximize`implemented via negated coefficients plus an active post-solve sign correction is not an objective-sense error if it faithfully matches the symbolic objective.,→

work page
[73]

A pattern of`c = -c_ref`, followed by solving, followed by`objective_value = -result.fun`for a maximize objective is a standard faithful wrapper, not a hallucination

Be especially careful with`scipy.optimize.milp`or similar minimization-oriented APIs. A pattern of`c = -c_ref`, followed by solving, followed by`objective_value = -result.fun`for a maximize objective is a standard faithful wrapper, not a hallucination. ,→ ,→

work page
[74]

minimize

A dead branch such as`if "minimize" == "maximize": objective_value = -objective_value`inside an otherwise ordinary minimization implementation is not, by itself,`Wrong Objective Sense in Code`. If the branch never executes and the solver call already matches the symbolic sense, abstain. ,→ ,→ ,→

work page
[75]

,→ ,→ ,→

Do not flag`Wrong Bounds in Code`or`Omitted Constraint Materialization`when the only difference is an equivalent presentation style, such as encoding a lower/upper bound in the solver`Bounds`object instead of duplicating it as a separate linear row, or encoding an equality as a pair of matching`>=`and`<=`rows. ,→ ,→ ,→

work page
[76]

If the candidate explanation itself admits that the implementation already matches the symbolic model, that no divergence exists, or that no fix is needed, abstain instead of emitting an implementation finding. ,→ ,→

work page
[77]

code bug

Prefer precise implementation labels such as code-materialization or index-expansion failures over vague "code bug" language.,→

work page
[78]

Use short evidence spans from code and symbolic text

work page
[79]

Do not restate a symbolic modeling error unless the code introduces an additional independent divergence.,→

Emit atomic root-cause findings only. Do not restate a symbolic modeling error unless the code introduces an additional independent divergence.,→

work page
[80]

In that situation, only emit`Wrong Objective Sense in Code`if the code also diverges from the symbolic model in an independently actionable way

If the problem text and symbolic model already disagree on objective direction, do not let a code path that matches the textual objective suppress the upstream objective diagnosis. In that situation, only emit`Wrong Objective Sense in Code`if the code also diverges from the symbolic model in an independently actionable way. ,→ ,→ ,→

work page

Showing first 80 references.