ORAgentBench evaluates 14 LLM agent configurations on 107 end-to-end OR tasks and finds the best agent passes only 35.51% overall and 20.59% of hard tasks.
NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We present NEMO, a system that translates Natural-language descriptions of decision problems into formal Executable Mathematical Optimization implementations using autonomous coding agents (ACAs). Existing approaches rely on specialized large language models (LLMs) or bespoke task-specific agents that are often brittle and frequently generate syntactically invalid or non-executable code. NEMO instead treats ACAs as a first-class abstraction analogous to API-based interaction with LLMs; their sandboxed execution guarantees code is executable by construction and supports automated validation and repair. We introduce novel coordination patterns including asymmetric validation loops between independently generated optimizer and simulator implementations, external memory for experience reuse, and robustness enhancements via minimum Bayes risk (MBR) decoding and self-consistency. Across nine established optimization benchmarks, NEMO achieves state-of-the-art performance on the majority of tasks with substantial margins on several datasets, demonstrating the power of execution-aware agentic architectures for automated optimization modeling.
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ORAgentBench: Can LLM Agents Solve Challenging Operations Research Tasks End to End?
ORAgentBench evaluates 14 LLM agent configurations on 107 end-to-end OR tasks and finds the best agent passes only 35.51% overall and 20.59% of hard tasks.