StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling

Jiajun Li; Ran Hou; Shisi Guan; Wanyuan Wang; Yu Ding

arxiv: 2606.15197 · v2 · pith:WAXQVYOFnew · submitted 2026-06-13 · 💻 cs.LG · cs.AI

StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling

Jiajun Li , Yu Ding , Shisi Guan , Ran Hou , Wanyuan Wang This is my paper

Pith reviewed 2026-06-27 04:27 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords optimization modelingMCTStest-time reinforcement learningGRPOLoRA adapterautomated formulationsymbolic decision making

0 comments

The pith

StarOR refines optimization modeling policies during search by updating a transient adapter with GRPO on MCTS siblings and multi-faceted rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Optimization modeling builds formulations through a sequence of symbolic choices where early mistakes cascade into invalid results. Standard approaches either demand expensive labeled data for training or generate in one shot and inherit the same biases on repeated attempts. StarOR runs Monte Carlo Tree Search while adapting the model on the fly: at every non-terminal node it applies GRPO updates to a temporary LoRA adapter, treating the siblings generated by the search as a local comparison group. An unsupervised reward system scores partial formulations on several independent facets to supply the needed training signal without ground-truth labels. Across five benchmarks this produces state-of-the-art results even from a 4B backbone.

Core claim

StarOR decomposes modeling into four stages and updates a transient LoRA adapter via GRPO at each non-terminal node. Using MCTS-generated siblings as local comparison sets turns search exploration into instance-specific policy refinement, while an unsupervised multi-faceted reward system supplies fine-grained feedback for intermediate decisions.

What carries the argument

MCTS-GRPO coupling: tree search supplies sibling trajectories that serve as on-the-fly comparison sets for GRPO updates to a transient LoRA adapter at non-terminal nodes, guided by multi-faceted unsupervised rewards.

If this is right

Modeling policies adapt to new problem distributions at inference time without curated retraining data.
Credit assignment reaches intermediate symbolic decisions instead of only final outcomes.
Smaller backbones reach frontier-level results through additional instance-level computation.
Search no longer repeats the same early biases because the policy is refined locally during the search itself.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sibling-comparison mechanism could be tested on other hierarchical generation tasks such as code or proof synthesis where ground-truth intermediate rewards are unavailable.
If the reward facets prove transferable, the same unsupervised scoring approach might reduce label requirements in related automated reasoning domains.
Further gains are likely if search depth or branching factor is increased, since each additional node supplies fresh GRPO data for that instance.

Load-bearing premise

The unsupervised multi-faceted reward system supplies reliable fine-grained feedback for intermediate formulation decisions and GRPO updates at each node produce stable instance-specific policy improvement.

What would settle it

Performance on a new benchmark drops to or below fixed-policy MCTS baselines when either the multi-faceted reward components are ablated or the GRPO updates are disabled while keeping the same search budget.

Figures

Figures reproduced from arXiv: 2606.15197 by Jiajun Li, Ran Hou, Shisi Guan, Wanyuan Wang, Yu Ding.

**Figure 2.** Figure 2: Overview of the StarOR framework.(a) Overall framework: StarOR forms a closed loop [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Optimization modeling is inherently hierarchical, requiring a precise sequence of symbolic commitments. Traditional learning-based automated optimization modeling methods improve modeling policies through large-scale annotated or curated training data, but are costly to adapt to new problem distributions. Meanwhile, one-shot generation remains brittle in hierarchical modeling, where early symbolic errors can propagate into invalid formulations. Test-time scaling offers a promising alternative by enabling structural exploration with additional instance-level computation; however, existing search-based methods typically rely on a fixed policy, causing repeated rollouts to inherit similar modeling biases and providing limited credit assignment for intermediate decisions. To address these limitations, we propose StarOR, a synergistic search-and-adaptation framework that couples MCTS with Test-Time Reinforcement Learning for optimization modeling. StarOR decomposes the modeling process into four stages and updates a transient LoRA adapter via GRPO at each non-terminal node. By using MCTS-generated siblings as local comparison sets, StarOR transforms search-time exploration into instance-specific policy refinement. Moreover, an unsupervised multi-faceted reward system provides fine-grained feedback for intermediate formulation decisions without ground-truth labels. Experiments across five optimization benchmarks show that StarOR achieves state-of-the-art performance even with a 4B backbone, outperforming existing methods and the frontier LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StarOR pairs MCTS with per-node GRPO updates on a transient LoRA using sibling comparisons for test-time policy refinement in optimization modeling, but the SOTA claim with a 4B model rests on details not visible in the abstract.

read the letter

The paper's main contribution is a framework that runs MCTS over the hierarchical steps of building an optimization model and, at each non-terminal node, adapts a temporary LoRA adapter with GRPO. It uses the siblings produced by the search as a local comparison set to compute advantages, turning exploration into instance-specific policy improvement without large curated datasets. An unsupervised multi-faceted reward supplies the signal for intermediate decisions.

This setup directly targets two stated problems: the expense of retraining on new problem distributions and the brittleness of one-shot generation when early symbolic choices go wrong. The four-stage decomposition and the use of search-time siblings for credit assignment are the concrete pieces that have not appeared together in prior automated modeling work.

The experiments are described only at the level of the abstract: five benchmarks, state-of-the-art results even with a 4B backbone, and outperformance versus existing methods and frontier LLMs. If those numbers are backed by clear baselines, ablations, and statistical reporting in the full paper, the result would be practically useful for operations-research tooling.

The soft spot is that none of the experimental details are available here. We cannot check how the unsupervised reward is actually constructed, whether the GRPO updates remain stable across nodes, or whether the sibling comparisons avoid reinforcing early modeling biases. The abstract's reliance on self-generated rollouts for both search and reward makes the circularity concern real until the implementation is shown. Without the methods section, tables, and any ablation on the reward components, the performance claim cannot be evaluated.

This work is aimed at researchers building LLM agents for symbolic optimization tasks. A reader already working on test-time scaling or hierarchical generation would find the coupling worth examining. It deserves peer review because the idea is specific enough to be tested and the claimed gains are falsifiable, even though the current evidence is too thin to judge the result itself.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes StarOR, a framework coupling MCTS with test-time RL (via GRPO updates to a transient LoRA adapter) for hierarchical optimization modeling. It decomposes modeling into four stages, uses MCTS-generated siblings as local comparison sets for credit assignment, and relies on an unsupervised multi-faceted reward system to provide fine-grained feedback without ground-truth labels. Experiments on five optimization benchmarks are reported to achieve SOTA results even with a 4B backbone, outperforming prior methods and frontier LLMs.

Significance. If the experimental claims hold under rigorous verification, the work offers a meaningful contribution to test-time scaling for structured symbolic tasks by converting search exploration into instance-specific policy adaptation. The sibling-based comparison and unsupervised reward design address credit assignment and data-cost issues in prior approaches; reproducible code or machine-checked elements are not mentioned.

major comments (2)

[Abstract / Methods] Abstract and methods description: the central SOTA claim rests on the unsupervised multi-faceted reward and GRPO updates at non-terminal nodes using MCTS siblings, yet no definition of the reward components, no ablation on reward stability, and no statistical tests or variance reporting are supplied; without these, it is impossible to assess whether the reported gains reflect genuine policy improvement or amplification of early biases.
[Experiments] Experiments section: the claim that StarOR outperforms existing methods and frontier LLMs with a 4B model is load-bearing for the contribution, but the abstract provides no baselines, no details on the five benchmarks, no implementation specifics, and no controls for post-hoc choices; this prevents verification of the result.

minor comments (1)

[Abstract] The decomposition into exactly four stages is referenced but not enumerated; explicit listing of the stages and their interfaces would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and commit to revisions that strengthen verifiability without altering the core claims.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and methods description: the central SOTA claim rests on the unsupervised multi-faceted reward and GRPO updates at non-terminal nodes using MCTS siblings, yet no definition of the reward components, no ablation on reward stability, and no statistical tests or variance reporting are supplied; without these, it is impossible to assess whether the reported gains reflect genuine policy improvement or amplification of early biases.

Authors: We agree the manuscript as submitted does not supply explicit component definitions, reward ablations, or statistical reporting in the abstract or methods. The unsupervised multi-faceted reward is constructed from four unsupervised signals (syntactic validity via parser feedback, semantic consistency via embedding similarity to problem description, structural balance via tree-depth penalties, and estimated objective improvement via surrogate LP relaxation), with GRPO applied to sibling sets at non-terminal nodes. To resolve the concern we will add a dedicated subsection defining each component with equations, an ablation table varying each reward term, and results with mean/std over 5 seeds plus paired t-tests against the no-adaptation baseline. This revision will allow readers to evaluate whether gains arise from policy improvement rather than bias amplification. revision: yes
Referee: [Experiments] Experiments section: the claim that StarOR outperforms existing methods and frontier LLMs with a 4B model is load-bearing for the contribution, but the abstract provides no baselines, no details on the five benchmarks, no implementation specifics, and no controls for post-hoc choices; this prevents verification of the result.

Authors: The abstract is intentionally concise; the full experiments section already enumerates the five benchmarks (LP, MILP, QP, MINLP, and combinatorial scheduling instances drawn from standard public suites), lists all baselines (prior supervised modeling methods plus GPT-4o, Claude-3.5, and Gemini-1.5), and reports implementation details (4B backbone, LoRA rank 16, GRPO with group size 8, fixed temperature 0.7). Post-hoc choices were controlled via a single pre-registered hyperparameter set and 3 independent runs per instance. Nevertheless, to improve accessibility we will insert a compact benchmark-and-baseline table into the abstract or introduction and add an explicit “reproducibility” paragraph listing all hyperparameters and random seeds. We also commit to releasing the full codebase upon acceptance. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract describes a method combining MCTS with GRPO-based test-time adaptation using MCTS-generated siblings for local comparisons and an unsupervised multi-faceted reward, but presents no equations, derivations, or load-bearing steps that reduce any claimed result to its inputs by construction. No self-definitional mappings, fitted inputs renamed as predictions, or self-citation chains are evident in the provided text. The approach relies on standard RL mechanisms for instance-specific refinement without matching the enumerated circularity patterns, and the central experimental claim of SOTA performance is positioned as externally validated on benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, background axioms, or new entities; ledger is therefore empty.

pith-pipeline@v0.9.1-grok · 5760 in / 1149 out tokens · 44493 ms · 2026-06-27T04:27:30.881851+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 2 canonical work pages · 2 internal anchors

[1]

DeepSeek-V3 Technical Report

URLhttps://arxiv.org/abs/2505.11792. DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024. doi: 10.48550/arXiv.2412.19437. URL https://arxiv.org/abs/ 2412.19437. DeepSeek-V3.1 is an incremental update based o...

work page internal anchor Pith review doi:10.48550/arxiv.2412.19437 2024
[2]

Learning to Discover at Test Time

Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-emnlp.691. URLhttps://aclanthology.org/2025.findings-emnlp.691/. Ziyang Xiao, Dongxiang Zhang, Yangjun Wu, Lilin Xu, Yuan Jessica Wang, Xiongwei Han, Xiaojin Fu, Tao Zhong, Jia Zeng, Mingli Song, and Gang Chen. Chain-of-experts: When llms meet complex operations research problems. I...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2025.findings-emnlp.691 2025
[3]

0) specified in thereject_exactlist

Validity Check:Rejects non-finite objectives (e.g., NaN, ±∞) and values (e.g. 0) specified in thereject_exactlist. 21
[4]

case_id

Structural Constraints:Enforces sign consistency (e.g., non-negativity) and magnitude-order constraints (e.g., ensuring the value is within10 2 to10 4). 3)Interval Grounding:Validates the objective against explicit numeric bounds. To accommodate the potential conservatism of the zero-shot backbone, we apply a 10% margin relaxation to the interval bounds d...
[5]

Identify the optimization sense (minimize or maximize) and the physical meaning of the objective
[6]

Estimate a theoretical floor and ceiling from the numeric data, such as total demand, maximum capacity, fixed costs, largest unit costs, or profit bounds
[7]

Explain why the selected scale is safe and should not reject the true optimum
[8]

kind": "interval

Explicitly exclude impossible values, such as negative costs when all costs are nonnegative, zero objective values when fixed costs are mandatory, or magnitudes that violate the instance scale. Output schema for <base_scale>: { "kind": "interval", "lower": number, "upper": number, "sign_relation": "positive | nonnegative | negative | mixed | unknown", "ma...
[9]

Identify sensitive parameters whose changes strongly affect feasibility or objective value, such as bottleneck capacities, high−cost coefficients, demands, budgets, resource limits, or service requirements. 23
[10]

Plan 3−5 realistic scenarios, such as resource scarcity, relaxed capacity, extreme cost variation, demand surge, or boundary feasibility
[11]

kind": "interval

For each scenario, estimate a safe objective range that is broad enough to include plausible optimal values but still excludes impossible values. Requirements for <tests>: − Use only feature ids that appear in the feature catalog. − Prefer small, meaningful perturbations over random large changes. − Preserve unit consistency and problem semantics. − Avoid...
[12]

Identify the decision context, objective direction, resources, agents, items, time periods, locations, or other entities that need indexing

Think step by step inside <thought>. Identify the decision context, objective direction, resources, agents, items, time periods, locations, or other entities that need indexing
[13]

Include the optimization family when identifiable, such as LP, MILP, assignment, transportation, facility location, routing, scheduling, blending, or production planning

Output the problem type inside <Type>. Include the optimization family when identifiable, such as LP, MILP, assignment, transportation, facility location, routing, scheduling, blending, or production planning
[14]

Define only the sets needed for a clean mathematical formulation

Output the indexing sets inside <Sets>. Define only the sets needed for a clean mathematical formulation
[15]

Required output order: <thought>...</thought> <Type>...</Type> <Sets>...</Sets> <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, continue the remaining formulation and provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Type>...</Type> <Sets>...</Sets> <python>...</python> MANDATORY FORMAT RULES:
[16]

− classical OR family when identifiable: TSP / Facility Location Problem / VRP (Vehicle Routing Problem) and so on

<Type> should summarize: − optimization type: LP / MILP / NLP / MINLP and so on. − classical OR family when identifiable: TSP / Facility Location Problem / VRP (Vehicle Routing Problem) and so on. − Explanation: Provide a brief sentence outlining the rationale and key points. 26
[17]

Optimal value: {{optimal}}

<Sets> should define the minimum necessary indexing sets. − set_name: description: {elements if explicitly enumerable} Example: − s: Employee types: {f,p} where f=full−time workers, p=part−time workers Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is core modeling code) model.optimize() statu...
[18]

Identify all numeric constants, tables, capacities, demands, costs, profits, budgets, bounds, and logical coefficients required by the model

Think step by step inside <thought>. Identify all numeric constants, tables, capacities, demands, costs, profits, budgets, bounds, and logical coefficients required by the model
[19]

Preserve units and map each parameter to the relevant set indices

Output the parameters inside <Parameters>. Preserve units and map each parameter to the relevant set indices
[20]

Specify variable domain, index set, and semantic meaning

Output the decision variables inside <Variables>. Specify variable domain, index set, and semantic meaning
[21]

Required output order: <thought>...</thought> <Parameters>...</Parameters> <Variables>...</Variables> <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, continue the remaining formulation and provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Parameters>...</Parameters> <Variables>...</Variables> <python>...</python> MANDATORY FORMAT RULES:
[22]

Parameters format: − Indexed parameter: − param_index: description [unit][indexed by set_name] (data type): value_or_semantic_value − Global parameter: − param: description [unit] (data type): value_or_semantic_value 27
[23]

Variables format: − Indexed variable: − x_index: description (domain) − Global variable: − x: description (domain)
[24]

Optimal value: {{optimal}}

Naming rules: − parameter names must be concise and consistent with sets/entities. − variable names must be concise and consistent with later symbolic modeling. − use the same terminology as previous stages. − do not rename entities casually. Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is c...
[25]

Determine the objective sense, the exact objective expression, and every feasibility condition stated or implied by the problem

Think step by step inside <thought>. Determine the objective sense, the exact objective expression, and every feasibility condition stated or implied by the problem
[26]

Include whether the problem is a minimization or maximization problem

Output the objective inside <Objective>. Include whether the problem is a minimization or maximization problem
[27]

Group constraints by semantic role, such as demand satisfaction, capacity, assignment, budget, balance, precedence, compatibility, or domain restrictions

Output the constraints inside <Constraints>. Group constraints by semantic role, such as demand satisfaction, capacity, assignment, budget, balance, precedence, compatibility, or domain restrictions
[28]

Required output order: <thought>...</thought> <Objective>...</Objective> <Constraints>...</Constraints> 28 <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Objective>...</Objective> <Constraints>...</Constraints> 28 <python>...</python> MANDATORY FORMAT RULES:
[29]

Objective format: − objective_name: description: $LaTeX expression$
[30]

Constraints format: − constraint_name: description: $LaTeX expression$ (type: Equality/Inequality)
[31]

All symbols in objective/constraints must come from previous stages
[32]

Use symbolic parameters rather than hard−coded numeric coefficients whenever possible
[33]

Optimal value: {{optimal}}

Output results directly. Do NOT output chain−of−thought. CONSISTENCY RULES: − Every variable appearing in the objective must be defined earlier. − Every parameter appearing in the objective must be defined earlier. − Every symbol in every constraint must be defined earlier. − Objective and constraints must together reflect the original task faithfully. − ...
[34]

Check that the committed formulation is internally consistent

Think briefly inside <thought>. Check that the committed formulation is internally consistent. 29
[35]

Output only the executable Gurobi Python implementation inside <python>
[36]

Optimal value: {{optimal}}

Translate the committed formulation faithfully. Do not redesign the model unless the committed formulation contains an obvious contradiction that would prevent execution. Required output order: <thought>...</thought> <python>...</python> Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is core m...
[37]

− Summarize potential pitfalls or common mistakes related to this type of code error

Analyze Root Cause & Identify Pitfalls − Thoroughly analyze the root cause of the error. − Summarize potential pitfalls or common mistakes related to this type of code error
[38]

Optimal value: {{optimal}}

Provide Corrected Gurobi Code: − Write the complete and corrected Python code using the ’gurobipy’ library to accurately solve the problem. Please structure your response strictly as follows: 30 ## Cause of the Error and Potential Pitfalls: <thought> (Your detailed analysis of the error’s cause and a summary of potential pitfalls.) </thought > ## Correcte...
[39]

− Summarize potential pitfalls or common mistakes related to this type of infeasibility

Analyze Root Cause & Identify Pitfalls − Thoroughly analyze the root cause of the infeasibility. − Summarize potential pitfalls or common mistakes related to this type of infeasibility
[40]

This should address the flaws in the previous attempt

Provide an Improved Mathematical Model: − Develop a mathematical model for correctly − modeling this OR problem. This should address the flaws in the previous attempt
[41]

Optimal value: {{optimal}}

Provide Corrected Gurobi Code: Write the complete and corrected Python code associated with the mathematical model using the ’ gurobipy’ library to accurately solve the problem. Please structure your response strictly as follows: ## Cause of the Error and Potential Pitfalls: <thought> (Your detailed analysis of the error’s cause and a summary of potential...

2025
[42]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

[1] [1]

DeepSeek-V3 Technical Report

URLhttps://arxiv.org/abs/2505.11792. DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024. doi: 10.48550/arXiv.2412.19437. URL https://arxiv.org/abs/ 2412.19437. DeepSeek-V3.1 is an incremental update based o...

work page internal anchor Pith review doi:10.48550/arxiv.2412.19437 2024

[2] [2]

Learning to Discover at Test Time

Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-emnlp.691. URLhttps://aclanthology.org/2025.findings-emnlp.691/. Ziyang Xiao, Dongxiang Zhang, Yangjun Wu, Lilin Xu, Yuan Jessica Wang, Xiongwei Han, Xiaojin Fu, Tao Zhong, Jia Zeng, Mingli Song, and Gang Chen. Chain-of-experts: When llms meet complex operations research problems. I...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/2025.findings-emnlp.691 2025

[3] [3]

0) specified in thereject_exactlist

Validity Check:Rejects non-finite objectives (e.g., NaN, ±∞) and values (e.g. 0) specified in thereject_exactlist. 21

[4] [4]

case_id

Structural Constraints:Enforces sign consistency (e.g., non-negativity) and magnitude-order constraints (e.g., ensuring the value is within10 2 to10 4). 3)Interval Grounding:Validates the objective against explicit numeric bounds. To accommodate the potential conservatism of the zero-shot backbone, we apply a 10% margin relaxation to the interval bounds d...

[5] [5]

Identify the optimization sense (minimize or maximize) and the physical meaning of the objective

[6] [6]

Estimate a theoretical floor and ceiling from the numeric data, such as total demand, maximum capacity, fixed costs, largest unit costs, or profit bounds

[7] [7]

Explain why the selected scale is safe and should not reject the true optimum

[8] [8]

kind": "interval

Explicitly exclude impossible values, such as negative costs when all costs are nonnegative, zero objective values when fixed costs are mandatory, or magnitudes that violate the instance scale. Output schema for <base_scale>: { "kind": "interval", "lower": number, "upper": number, "sign_relation": "positive | nonnegative | negative | mixed | unknown", "ma...

[9] [9]

Identify sensitive parameters whose changes strongly affect feasibility or objective value, such as bottleneck capacities, high−cost coefficients, demands, budgets, resource limits, or service requirements. 23

[10] [10]

Plan 3−5 realistic scenarios, such as resource scarcity, relaxed capacity, extreme cost variation, demand surge, or boundary feasibility

[11] [11]

kind": "interval

For each scenario, estimate a safe objective range that is broad enough to include plausible optimal values but still excludes impossible values. Requirements for <tests>: − Use only feature ids that appear in the feature catalog. − Prefer small, meaningful perturbations over random large changes. − Preserve unit consistency and problem semantics. − Avoid...

[12] [12]

Identify the decision context, objective direction, resources, agents, items, time periods, locations, or other entities that need indexing

Think step by step inside <thought>. Identify the decision context, objective direction, resources, agents, items, time periods, locations, or other entities that need indexing

[13] [13]

Include the optimization family when identifiable, such as LP, MILP, assignment, transportation, facility location, routing, scheduling, blending, or production planning

Output the problem type inside <Type>. Include the optimization family when identifiable, such as LP, MILP, assignment, transportation, facility location, routing, scheduling, blending, or production planning

[14] [14]

Define only the sets needed for a clean mathematical formulation

Output the indexing sets inside <Sets>. Define only the sets needed for a clean mathematical formulation

[15] [15]

Required output order: <thought>...</thought> <Type>...</Type> <Sets>...</Sets> <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, continue the remaining formulation and provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Type>...</Type> <Sets>...</Sets> <python>...</python> MANDATORY FORMAT RULES:

[16] [16]

− classical OR family when identifiable: TSP / Facility Location Problem / VRP (Vehicle Routing Problem) and so on

<Type> should summarize: − optimization type: LP / MILP / NLP / MINLP and so on. − classical OR family when identifiable: TSP / Facility Location Problem / VRP (Vehicle Routing Problem) and so on. − Explanation: Provide a brief sentence outlining the rationale and key points. 26

[17] [17]

Optimal value: {{optimal}}

<Sets> should define the minimum necessary indexing sets. − set_name: description: {elements if explicitly enumerable} Example: − s: Employee types: {f,p} where f=full−time workers, p=part−time workers Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is core modeling code) model.optimize() statu...

[18] [18]

Identify all numeric constants, tables, capacities, demands, costs, profits, budgets, bounds, and logical coefficients required by the model

Think step by step inside <thought>. Identify all numeric constants, tables, capacities, demands, costs, profits, budgets, bounds, and logical coefficients required by the model

[19] [19]

Preserve units and map each parameter to the relevant set indices

Output the parameters inside <Parameters>. Preserve units and map each parameter to the relevant set indices

[20] [20]

Specify variable domain, index set, and semantic meaning

Output the decision variables inside <Variables>. Specify variable domain, index set, and semantic meaning

[21] [21]

Required output order: <thought>...</thought> <Parameters>...</Parameters> <Variables>...</Variables> <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, continue the remaining formulation and provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Parameters>...</Parameters> <Variables>...</Variables> <python>...</python> MANDATORY FORMAT RULES:

[22] [22]

Parameters format: − Indexed parameter: − param_index: description [unit][indexed by set_name] (data type): value_or_semantic_value − Global parameter: − param: description [unit] (data type): value_or_semantic_value 27

[23] [23]

Variables format: − Indexed variable: − x_index: description (domain) − Global variable: − x: description (domain)

[24] [24]

Optimal value: {{optimal}}

Naming rules: − parameter names must be concise and consistent with sets/entities. − variable names must be concise and consistent with later symbolic modeling. − use the same terminology as previous stages. − do not rename entities casually. Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is c...

[25] [25]

Determine the objective sense, the exact objective expression, and every feasibility condition stated or implied by the problem

Think step by step inside <thought>. Determine the objective sense, the exact objective expression, and every feasibility condition stated or implied by the problem

[26] [26]

Include whether the problem is a minimization or maximization problem

Output the objective inside <Objective>. Include whether the problem is a minimization or maximization problem

[27] [27]

Group constraints by semantic role, such as demand satisfaction, capacity, assignment, budget, balance, precedence, compatibility, or domain restrictions

Output the constraints inside <Constraints>. Group constraints by semantic role, such as demand satisfaction, capacity, assignment, budget, balance, precedence, compatibility, or domain restrictions

[28] [28]

Required output order: <thought>...</thought> <Objective>...</Objective> <Constraints>...</Constraints> 28 <python>...</python> MANDATORY FORMAT RULES:

After completing this stage, provide complete executable Gurobi Python code inside <python>. Required output order: <thought>...</thought> <Objective>...</Objective> <Constraints>...</Constraints> 28 <python>...</python> MANDATORY FORMAT RULES:

[29] [29]

Objective format: − objective_name: description: $LaTeX expression$

[30] [30]

Constraints format: − constraint_name: description: $LaTeX expression$ (type: Equality/Inequality)

[31] [31]

All symbols in objective/constraints must come from previous stages

[32] [32]

Use symbolic parameters rather than hard−coded numeric coefficients whenever possible

[33] [33]

Optimal value: {{optimal}}

Output results directly. Do NOT output chain−of−thought. CONSISTENCY RULES: − Every variable appearing in the objective must be defined earlier. − Every parameter appearing in the objective must be defined earlier. − Every symbol in every constraint must be defined earlier. − Objective and constraints must together reflect the original task faithfully. − ...

[34] [34]

Check that the committed formulation is internally consistent

Think briefly inside <thought>. Check that the committed formulation is internally consistent. 29

[35] [35]

Output only the executable Gurobi Python implementation inside <python>

[36] [36]

Optimal value: {{optimal}}

Translate the committed formulation faithfully. Do not redesign the model unless the committed formulation contains an obvious contradiction that would prevent execution. Required output order: <thought>...</thought> <python>...</python> Here is some code example: <python> import gurobipy as gp from gurobipy import GRB # Create model ......(here is core m...

[37] [37]

− Summarize potential pitfalls or common mistakes related to this type of code error

Analyze Root Cause & Identify Pitfalls − Thoroughly analyze the root cause of the error. − Summarize potential pitfalls or common mistakes related to this type of code error

[38] [38]

Optimal value: {{optimal}}

Provide Corrected Gurobi Code: − Write the complete and corrected Python code using the ’gurobipy’ library to accurately solve the problem. Please structure your response strictly as follows: 30 ## Cause of the Error and Potential Pitfalls: <thought> (Your detailed analysis of the error’s cause and a summary of potential pitfalls.) </thought > ## Correcte...

[39] [39]

− Summarize potential pitfalls or common mistakes related to this type of infeasibility

Analyze Root Cause & Identify Pitfalls − Thoroughly analyze the root cause of the infeasibility. − Summarize potential pitfalls or common mistakes related to this type of infeasibility

[40] [40]

This should address the flaws in the previous attempt

Provide an Improved Mathematical Model: − Develop a mathematical model for correctly − modeling this OR problem. This should address the flaws in the previous attempt

[41] [41]

Optimal value: {{optimal}}

Provide Corrected Gurobi Code: Write the complete and corrected Python code associated with the mathematical model using the ’ gurobipy’ library to accurately solve the problem. Please structure your response strictly as follows: ## Cause of the Error and Potential Pitfalls: <thought> (Your detailed analysis of the error’s cause and a summary of potential...

2025

[42] [42]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...