pith. machine review for the scientific record. sign in

arxiv: 2604.17708 · v1 · submitted 2026-04-20 · 💻 cs.AI

Recognition: unknown

Co-evolving Agent Architectures and Interpretable Reasoning for Automated Optimization

Jiahao Huang, Peilan Xu, Wenjian Luo, Xiaoya Nan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:18 UTC · model grok-4.3

classification 💻 cs.AI
keywords agent architecture evolutionoperations research automationLLM-based agentsgraph-mediated evolutioninterpretable reasoningco-evolutionary optimizationAOE network representationautomated solver selection
0
0 comments X

The pith

Representing agent workflows as evolvable AOE-style networks and co-evolving their topologies with reasoning paths improves automated operations research performance and adds structural interpretability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that fixed, hand-crafted workflows limit LLMs on complex operations research tasks that need flexible coordination across interpretation, formulation, solver choice, code generation, and debugging. By representing these workflows explicitly as activity-on-edge networks, the method evolves a population of architectures and reasoning trajectories together using graph-based recombination and semantic mutations, while injecting reusable practices from a knowledge base. Empirical tests on varied OR benchmarks indicate consistent gains over zero-shot LLMs, static pipelines, and prior evolutionary agent systems. The explicit graph view also makes alternative reasoning paths and execution dependencies visible, supporting both performance and human-readable structure. If correct, this treats agent design itself as an optimizable, inspectable object rather than a static choice.

Core claim

The EvoOR-Agent framework represents agent workflows as AOE-style networks that expose topology, dependencies, and alternative paths. It then maintains an architecture graph and evolves populations of reasoning individuals via graph-mediated path-conditioned recombination, multi-granularity semantic mutation, and elitist updates, augmented by knowledge-base-assisted experience acquisition. On heterogeneous OR benchmarks this produces consistent improvements over zero-shot LLMs, fixed-pipeline agents, and earlier evolutionary frameworks, with case studies and ablations attributing gains to explicit architecture evolution and graph-supported trajectory search.

What carries the argument

AOE-style network representation of agent workflows, which makes topology and dependencies explicit and supports graph-mediated path-conditioned recombination plus multi-granularity semantic mutation for joint evolution of architectures and reasoning.

If this is right

  • Agent coordination among interpretation, formulation, solver selection, code generation, and debugging becomes adaptive rather than hand-crafted.
  • Reasoning trajectories gain explicit, inspectable alternative paths through the graph representation.
  • Reusable OR practices can be systematically injected into both initialization and variation steps.
  • Performance improvements appear on heterogeneous benchmarks spanning different problem types and scales.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-evolution approach could be tested on non-OR domains that also require multi-step reasoning and tool use, such as scientific modeling pipelines.
  • Structural interpretability might allow human experts to intervene or prune unproductive branches in deployed agent systems.
  • If architecture evolution proves robust, future agent frameworks could start from minimal seeds and grow task-specific topologies without manual redesign.
  • Scalability questions remain around how graph size and population size trade off against compute cost on larger industrial instances.

Load-bearing premise

Representing workflows as AOE-style networks and evolving them with graph-mediated recombination and semantic mutation will yield meaningful, generalizable gains on complex OR tasks.

What would settle it

Running the framework on a fresh collection of OR benchmarks and finding no consistent outperformance versus zero-shot LLMs and fixed-pipeline agents, or finding that ablations removing architecture evolution erase the reported gains.

Figures

Figures reproduced from arXiv: 2604.17708 by Jiahao Huang, Peilan Xu, Wenjian Luo, Xiaoya Nan.

Figure 1
Figure 1. Figure 1: Evolution of OR problem-solving paradigms. The upper path illustrates [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture graph evolution. Individual OR agents are first abstracted into AOE chains, which are merged by phase-local state alignment to form the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of reasoning trajectory evolution on the current architecture graph. An LLM-agent-based experience acquisition workflow retrieves relevant [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Convergence behavior with a fixed population size of [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Population-size sensitivity with a fixed iteration depth of [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Population dynamics across generations. The vertical axis denotes [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Automating operations research (OR) with large language models (LLMs) remains limited by hand-crafted reasoning--execution workflows. Complex OR tasks require adaptive coordination among problem interpretation, mathematical formulation, solver selection, code generation, and iterative debugging. To address this limitation, we propose EvoOR-Agent, a co-evolutionary framework for automated optimization. The framework represents agent workflows as activity-on-edge (AOE)-style networks, making workflow topology, execution dependencies, and alternative reasoning paths explicit. On this representation, the framework maintains an architecture graph and evolves a population of reasoning individuals through graph-mediated path-conditioned recombination, multi-granularity semantic mutation, and elitist population update. A knowledge-base-assisted experience-acquisition module further injects reusable OR practices into initialization and semantic variation. Empirical results on heterogeneous OR benchmarks show that the proposed framework consistently improves over zero-shot LLMs, fixed-pipeline OR agents, and representative evolutionary agent frameworks. Case studies and ablation analyses further indicate that explicit architecture evolution and graph-supported reasoning-trajectory search contribute to both performance improvement and structural interpretability. These results suggest that treating agent architectures and reasoning trajectories as evolvable objects provides an effective route toward adaptive and interpretable automated optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes EvoOR-Agent, a co-evolutionary framework for LLM-based automated optimization in operations research. Agent workflows are represented as activity-on-edge (AOE) networks to expose topology and dependencies; a population of reasoning individuals is then evolved via graph-mediated path-conditioned recombination, multi-granularity semantic mutation, and elitist selection, with an auxiliary knowledge-base module injecting reusable OR practices. The central claim is that this architecture evolution yields consistent performance gains over zero-shot LLMs, fixed-pipeline OR agents, and prior evolutionary agent frameworks on heterogeneous OR benchmarks, while also improving structural interpretability.

Significance. If the empirical results can be shown to arise specifically from the AOE-graph operators rather than ancillary factors, the work would offer a concrete, interpretable route to automated agent design for complex reasoning tasks. The explicit graph representation of workflows is a clear methodological contribution that could transfer to other multi-step agent systems. At present, however, the significance remains provisional because the manuscript supplies no quantitative metrics, population/generation details, or controlled ablations that would allow attribution of gains to the proposed mechanisms.

major comments (3)
  1. [Abstract / Empirical Results] Abstract and empirical evaluation: the claim that the framework 'consistently improves' over baselines is asserted without any numerical results, benchmark names, performance deltas, or error analysis. This is load-bearing for the central contribution; without these data the reader cannot evaluate whether the AOE-network evolution produces meaningful, generalizable gains.
  2. [Framework Description] Framework description: the knowledge-base-assisted experience-acquisition module is described as injecting 'reusable OR practices' into initialization and mutation, yet no details are given on its construction, automation, or curation process. If this module relies on human-curated examples or extra LLM queries absent from the fixed-pipeline baselines, the headline improvement cannot be attributed to the co-evolutionary operators.
  3. [Ablation Analyses / Case Studies] Ablation and case-study sections: while the abstract states that 'case studies and ablation analyses further indicate that explicit architecture evolution and graph-supported reasoning-trajectory search contribute' to gains, no quantitative ablation isolating graph-mediated recombination versus semantic mutation versus the knowledge base is supplied. This leaves the key mechanistic assumption untested.
minor comments (2)
  1. [Abstract] The acronym AOE is introduced in the abstract without immediate expansion; a parenthetical definition on first use would improve readability.
  2. [Framework Description] Notation for the architecture graph and path-conditioned recombination operators should be formalized (e.g., with a small diagram or pseudocode) to make the evolutionary operators reproducible.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for strengthening the empirical claims and mechanistic clarity. We address each major comment below and will revise the manuscript accordingly to incorporate the requested details, metrics, and analyses.

read point-by-point responses
  1. Referee: [Abstract / Empirical Results] Abstract and empirical evaluation: the claim that the framework 'consistently improves' over baselines is asserted without any numerical results, benchmark names, performance deltas, or error analysis. This is load-bearing for the central contribution; without these data the reader cannot evaluate whether the AOE-network evolution produces meaningful, generalizable gains.

    Authors: We agree that the abstract and empirical presentation would benefit from explicit numerical support. The current manuscript reports results on heterogeneous OR benchmarks but does not include specific deltas, benchmark names, or error analysis in the abstract. In the revision we will update the abstract to include key performance metrics, benchmark names, deltas, and error information drawn from the experiments. We will also add population and generation details to the methods and results sections to enable full evaluation of the gains. revision: yes

  2. Referee: [Framework Description] Framework description: the knowledge-base-assisted experience-acquisition module is described as injecting 'reusable OR practices' into initialization and mutation, yet no details are given on its construction, automation, or curation process. If this module relies on human-curated examples or extra LLM queries absent from the fixed-pipeline baselines, the headline improvement cannot be attributed to the co-evolutionary operators.

    Authors: We acknowledge that insufficient detail is currently provided on the knowledge-base module, which prevents clear attribution of improvements to the co-evolutionary operators versus the auxiliary component. In the revised manuscript we will expand the framework description with a dedicated subsection detailing the construction process, automation steps, curation of reusable OR practices, and any additional LLM queries employed. This will allow readers to assess the module's role relative to the fixed-pipeline baselines. revision: yes

  3. Referee: [Ablation Analyses / Case Studies] Ablation and case-study sections: while the abstract states that 'case studies and ablation analyses further indicate that explicit architecture evolution and graph-supported reasoning-trajectory search contribute' to gains, no quantitative ablation isolating graph-mediated recombination versus semantic mutation versus the knowledge base is supplied. This leaves the key mechanistic assumption untested.

    Authors: We agree that the current ablation and case-study material is insufficient to isolate the contributions of graph-mediated recombination, semantic mutation, and the knowledge base. Although the manuscript contains case studies, it lacks the requested quantitative controlled ablations. In the revision we will add a new subsection with quantitative ablation experiments, including performance tables that systematically remove or isolate each component to test the mechanistic assumptions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework proposal with external benchmarks

full rationale

The paper presents EvoOR-Agent as a new co-evolutionary framework that explicitly represents agent workflows as AOE-style networks and applies graph-mediated recombination, semantic mutation, and knowledge-base injection. All central claims rest on empirical comparisons against zero-shot LLMs, fixed-pipeline agents, and other evolutionary frameworks on heterogeneous OR benchmarks, plus ablation studies. No equations, derivations, or first-principles results are described that reduce to self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The architecture and operators are introduced as independent design choices whose value is tested externally rather than assumed by construction. This is the normal case of a self-contained empirical proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on standard assumptions about LLM capabilities for OR subtasks and the applicability of evolutionary search to workflow graphs, plus two newly introduced entities with no independent evidence outside the proposal.

axioms (2)
  • domain assumption Large language models can reliably perform problem interpretation, mathematical formulation, solver selection, code generation, and iterative debugging for OR tasks when guided by structured workflows.
    Implicit foundation for using LLMs as the base reasoning engine.
  • domain assumption Evolutionary operators applied to graph representations of workflows can discover superior reasoning trajectories and architectures.
    Core premise enabling the co-evolution mechanism.
invented entities (2)
  • EvoOR-Agent co-evolutionary framework no independent evidence
    purpose: Automated optimization via joint evolution of agent architectures and reasoning paths
    Newly proposed system; no external validation cited.
  • AOE-style network representation of agent workflows no independent evidence
    purpose: Explicit encoding of workflow topology, dependencies, and alternative reasoning paths
    Core modeling choice introduced to support graph-mediated evolution.

pith-pipeline@v0.9.0 · 5514 in / 1525 out tokens · 45818 ms · 2026-05-10T05:18:08.819895+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

97 extracted references · 25 canonical work pages · 8 internal anchors

  1. [2]

    2019 , publisher=

    Practical optimization , author=. 2019 , publisher=

  2. [3]

    Operations Research , volume=

    Procurement mechanisms for assortments of differentiated products , author=. Operations Research , volume=. 2021 , publisher=

  3. [4]

    Nature Reviews Physics , volume=

    Challenges and opportunities in quantum optimization , author=. Nature Reviews Physics , volume=. 2024 , publisher=

  4. [5]

    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

    Augmenting operations research with auto-formulation of optimization models from problem descriptions , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track , pages=

  5. [6]

    Optimus: Optimization modeling using MIP solvers and large language models

    Optimus: Optimization modeling using mip solvers and large language models , author=. arXiv preprint arXiv:2310.06116 , year=

  6. [8]

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    DeepSeek-Coder: when the large language model meets programming--the rise of code intelligence , author=. arXiv preprint arXiv:2401.14196 , year=

  7. [9]

    Nature , volume=

    DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning , author=. Nature , volume=. 2025 , publisher=

  8. [11]

    The twelfth international conference on learning representations , year=

    Chain-of-experts: When llms meet complex operations research problems , author=. The twelfth international conference on learning representations , year=

  9. [13]

    arXiv e-prints , pages=

    Or-llm-agent: Automating modeling and solving of operations research optimization problem with reasoning large language model , author=. arXiv e-prints , pages=

  10. [14]

    NeurIPS 2022 competition track , pages=

    Nl4opt competition: Formulating optimization problems based on their natural language descriptions , author=. NeurIPS 2022 competition track , pages=. 2023 , organization=

  11. [17]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  12. [18]

    Advances in neural information processing systems , volume=

    Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

  13. [20]

    Evolutionary computation , volume=

    Automated algorithm selection: Survey and perspectives , author=. Evolutionary computation , volume=. 2019 , publisher=

  14. [21]

    IEEE transactions on evolutionary computation , volume=

    A survey of automatic parameter tuning methods for metaheuristics , author=. IEEE transactions on evolutionary computation , volume=. 2019 , publisher=

  15. [22]

    IEEE Transactions on Evolutionary Computation , year=

    Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization , author=. IEEE Transactions on Evolutionary Computation , year=

  16. [23]

    Operations Research , volume=

    Orlm: A customizable framework in training large models for automated optimization modeling , author=. Operations Research , volume=. 2025 , publisher=

  17. [25]

    OptiBench: Benchmarking large language models in optimization modeling with equivalence-detection evaluation , author=

  18. [26]

    OR-PRM: A Process Reward Model for Algorithmic Problem in Operations Research , author=

  19. [29]

    Advances in neural information processing systems , volume=

    Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=

  20. [32]

    The twelfth international conference on learning representations , year=

    MetaGPT: Meta programming for a multi-agent collaborative framework , author=. The twelfth international conference on learning representations , year=

  21. [33]

    First conference on language modeling , year=

    Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First conference on language modeling , year=

  22. [34]

    ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

    Why do multiagent systems fail? , author=. ICLR 2025 Workshop on Building Trust in Language Models and Applications , year=

  23. [35]

    Nature , volume=

    Mathematical discoveries from program search with large language models , author=. Nature , volume=. 2024 , publisher=

  24. [36]

    Handbook of evolutionary machine learning , pages=

    Evolution through large models , author=. Handbook of evolutionary machine learning , pages=. 2023 , publisher=

  25. [38]

    IEEE Transactions on Evolutionary Computation , volume=

    Evolutionary computation in the era of large language model: Survey and roadmap , author=. IEEE Transactions on Evolutionary Computation , volume=. 2024 , publisher=

  26. [43]

    IEEE Transactions on evolutionary computation , volume=

    A survey on multiobjective evolutionary algorithms for the solution of the portfolio optimization problem and other finance and economics applications , author=. IEEE Transactions on evolutionary computation , volume=. 2012 , publisher=

  27. [44]

    IEEE transactions on cybernetics , volume=

    A mixture-of-experts prediction framework for evolutionary dynamic multiobjective optimization , author=. IEEE transactions on cybernetics , volume=. 2019 , publisher=

  28. [46]

    Neural Computing and Applications , volume=

    Evolutionary algorithms and their applications to engineering problems , author=. Neural Computing and Applications , volume=. 2020 , publisher=

  29. [47]

    Information sciences , volume=

    A survey on optimization metaheuristics , author=. Information sciences , volume=. 2013 , publisher=

  30. [48]

    IEEE Transactions on Emerging Topics in Computational Intelligence , volume =

    Zou, Yanghe and Xu, Peilan and Dai, Hao and Song, Heng and Luo, Wenjian , title =. IEEE Transactions on Emerging Topics in Computational Intelligence , volume =

  31. [49]

    IEEE Transactions on Evolutionary Computation , year=

    A universal framework for automatically generating single-and multi-objective evolutionary algorithms , author=. IEEE Transactions on Evolutionary Computation , year=

  32. [50]

    IEEE Transactions on Evolutionary Computation , volume=

    Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics , author=. IEEE Transactions on Evolutionary Computation , volume=. 2024 , publisher=

  33. [53]

    2025b , url =

    DeepMind , title =. 2025b , url =

  34. [54]

    High-Confidence Computing , volume=

    A survey on large language model (llm) security and privacy: The good, the bad, and the ugly , author=. High-Confidence Computing , volume=. 2024 , publisher=

  35. [55]

    IEEE Transactions on Evolutionary Computation , volume=

    Difficulty and contribution-based cooperative coevolution for large-scale optimization , author=. IEEE Transactions on Evolutionary Computation , volume=. 2022 , publisher=

  36. [56]

    ACM Transactions on Evolutionary Learning and Optimization , volume=

    Density-assisted evolutionary dynamic multimodal optimization , author=. ACM Transactions on Evolutionary Learning and Optimization , volume=. 2026 , publisher=

  37. [58]

    ACM Transactions on Software Engineering and Methodology , year=

    Model context protocol (mcp): Landscape, security threats, and future research directions , author=. ACM Transactions on Software Engineering and Methodology , year=

  38. [61]

    Journal of the Operational Research Society , volume=

    Hyper-heuristics: A survey of the state of the art , author=. Journal of the Operational Research Society , volume=. 2013 , publisher=

  39. [62]

    arXiv e-prints , pages=

    Evoprompt: Connecting llms with evolutionary algorithms yields powerful prompt optimizers , author=. arXiv e-prints , pages=

  40. [63]

    Evoagent: Towards automatic multi-agent generation via evolutionary algorithms , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  41. [64]

    Ramamonjison, H

    R. Ramamonjison, H. Li, T. Yu, S. He, V. Rengan, A. Banitalebi-Dehkordi, Z. Zhou, and Y. Zhang, ``Augmenting operations research with auto-formulation of optimization models from problem descriptions,'' in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2022, pp. 29--62

  42. [66]

    Huang, Z

    C. Huang, Z. Tang, S. Hu, R. Jiang, X. Zheng, D. Ge, B. Wang, and Z. Wang, `` ORLM : A customizable framework in training large models for automated optimization modeling,'' Operations Research, vol. 73, no. 6, pp. 2986--3009, 2025

  43. [67]

    Zhang and P

    B. Zhang and P. Luo, `` OR-LLM-AGENT : Automating modeling and solving of operations research optimization problem with reasoning large language model,'' arXiv e-prints, pp. arXiv--2503, 2025

  44. [68]

    A. Liu, A. Mei, B. Lin, B. Xue, B. Wang, B. Xu, B. Wu, B. Zhang, C. Lin, C. Dong et al., ``Deepseek-v3. 2: Pushing the frontier of open large language models,'' arXiv preprint arXiv:2512.02556, 2025

  45. [69]

    OpenAI GPT-5 System Card

    A. Singh, A. Fry, A. Perelman, A. Tart, A. Ganesh, A. El-Kishky, A. McLaughlin, A. Low, A. Ostrow, A. Ananthram et al., ``Openai gpt-5 system card,'' arXiv preprint arXiv:2601.03267, 2025

  46. [70]

    [Online]

    DeepMind, ``Gemini 3 pro model card,'' Google DeepMind, Model Card, 2025b. [Online]. Available: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf

  47. [71]

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv et al., ``Qwen3 technical report,'' arXiv preprint arXiv:2505.09388, 2025

  48. [72]

    P. E. Gill, W. Murray, and M. H. Wright, Practical optimization. 1em plus 0.5em minus 0.4em SIAM, 2019

  49. [73]

    Saban and G

    D. Saban and G. Y. Weintraub, ``Procurement mechanisms for assortments of differentiated products,'' Operations Research, vol. 69, no. 3, pp. 795--820, 2021

  50. [74]

    Abbas, A

    A. Abbas, A. Ambainis, B. Augustino, A. B \"a rtschi, H. Buhrman, C. Coffrin, G. Cortiana, V. Dunjko, D. J. Egger, B. G. Elmegreen et al., ``Challenges and opportunities in quantum optimization,'' Nature Reviews Physics, vol. 6, no. 12, pp. 718--735, 2024

  51. [75]

    An overview of gradient descent optimization algorithms,

    S. Ruder, ``An overview of gradient descent optimization algorithms,'' arXiv preprint arXiv:1609.04747, 2016

  52. [76]

    E. K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. \"O zcan, and R. Qu, ``Hyper-heuristics: A survey of the state of the art,'' Journal of the Operational Research Society, vol. 64, no. 12, pp. 1695--1724, 2013

  53. [77]

    P. Xu, W. Luo, X. Lin, Y. Chang, and K. Tang, ``Difficulty and contribution-based cooperative coevolution for large-scale optimization,'' IEEE Transactions on Evolutionary Computation, vol. 27, no. 5, pp. 1355--1369, 2022

  54. [78]

    Y. Zhu, P. Xu, J. Huang, X. Lin, and W. Luo, ``Density-assisted evolutionary dynamic multimodal optimization,'' ACM Transactions on Evolutionary Learning and Optimization, vol. 6, no. 1, pp. 1--30, 2026

  55. [79]

    Y. Zou, P. Xu, H. Dai, H. Song, and W. Luo, ``Swarm optimization with intra- and inter-hierarchical competition for large-scale berth allocation and crane assignment,'' IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 9, no. 2, pp. 1307--1321, Apr. 2025

  56. [80]

    Kerschke, H

    P. Kerschke, H. H. Hoos, F. Neumann, and H. Trautmann, ``Automated algorithm selection: Survey and perspectives,'' Evolutionary computation, vol. 27, no. 1, pp. 3--45, 2019

  57. [81]

    Huang, Y

    C. Huang, Y. Li, and X. Yao, ``A survey of automatic parameter tuning methods for metaheuristics,'' IEEE transactions on evolutionary computation, vol. 24, no. 2, pp. 201--216, 2019

  58. [82]

    Y. Tian, X. Qi, S. Yang, C. He, K. C. Tan, Y. Jin, and X. Zhang, ``A universal framework for automatically generating single-and multi-objective evolutionary algorithms,'' IEEE Transactions on Evolutionary Computation, 2025

  59. [83]

    Z. Ma, H. Guo, Y.-J. Gong, J. Zhang, and K. C. Tan, ``Toward automated algorithm design: A survey and practical guide to meta-black-box-optimization,'' IEEE Transactions on Evolutionary Computation, 2025

  60. [84]

    H. Wang, J. Gong, H. Zhang, J. Xu, and Z. Wang, ``Ai agentic programming: A survey of techniques, challenges, and opportunities,'' arXiv preprint arXiv:2508.11126, 2025

  61. [85]

    Van Stein and T

    N. Van Stein and T. B \"a ck, ``Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics,'' IEEE Transactions on Evolutionary Computation, vol. 29, no. 2, pp. 331--345, 2024

  62. [86]

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., ``Chain-of-thought prompting elicits reasoning in large language models,'' Advances in neural information processing systems, vol. 35, pp. 24\,824--24\,837, 2022

  63. [87]

    Shinn, F

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, ``Reflexion: Language agents with verbal reinforcement learning,'' Advances in neural information processing systems, vol. 36, pp. 8634--8652, 2023

  64. [88]

    S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin et al., ``Metagpt: Meta programming for a multi-agent collaborative framework,'' in The twelfth international conference on learning representations, 2023

  65. [89]

    OpenAI o1 System Card

    A. Jaech, A. Kalai, A. Lerer, A. Richardson, A. El-Kishky, A. Low, A. Helyar, A. Madry, A. Beutel, A. Carney et al., ``Openai o1 system card,'' arXiv preprint arXiv:2412.16720, 2024

  66. [90]

    D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi et al., ``Deepseek-r1 incentivizes reasoning in llms through reinforcement learning,'' Nature, vol. 645, no. 8081, pp. 633--638, 2025

  67. [91]

    H. Lu, Z. Xie, Y. Wu, C. Ren, Y. Chen, and Z. Wen, ``Optmath: A scalable bidirectional data synthesis framework for optimization modeling,'' arXiv preprint arXiv:2502.11102, 2025

  68. [92]

    Zhang and P

    B. Zhang and P. Luo, ``Or-llm-agent: Automating modeling and solving of operations research optimization problem with reasoning large language model,'' arXiv e-prints, pp. arXiv--2503, 2025

  69. [93]

    Huang, Z

    C. Huang, Z. Tang, S. Hu, R. Jiang, X. Zheng, D. Ge, B. Wang, and Z. Wang, ``Orlm: A customizable framework in training large models for automated optimization modeling,'' Operations Research, vol. 73, no. 6, pp. 2986--3009, 2025

  70. [94]

    Llmopt: Learning to define and solve general optimization problems from scratch.arXiv preprint arXiv:2410.13213,

    C. Jiang, X. Shu, H. Qian, X. Lu, J. Zhou, A. Zhou, and Y. Yu, ``Llmopt: Learning to define and solve general optimization problems from scratch,'' arXiv preprint arXiv:2410.13213, 2024

  71. [95]

    Z. Yang, Y. Wang, Y. Huang, Z. Guo, W. Shi, X. Han, L. Feng, L. Song, X. Liang, and J. Tang, ``Optibench meets resocratic: Measure and improve llms for optimization modeling,'' arXiv preprint arXiv:2407.09887, 2024

  72. [96]

    Kojima, S

    T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, ``Large language models are zero-shot reasoners,'' Advances in neural information processing systems, vol. 35, pp. 22\,199--22\,213, 2022

  73. [97]

    B. Li, K. Mellou, B. Zhang, J. Pathuri, and I. Menache, ``Large language models for supply chain optimization,'' arXiv preprint arXiv:2307.03875, 2023

  74. [98]

    arXiv preprint arXiv:2407.19633 , year =

    A. AhmadiTeshnizi, W. Gao, H. Brunborg, S. Talaei, C. Lawless, and M. Udell, ``Optimus-0.3: Using large language models to model and solve optimization problems at scale,'' arXiv preprint arXiv:2407.19633, 2024

  75. [99]

    H. Liu, J. Wang, Y. Cai, X. Han, Y. Kuang, and J. Hao, ``Optitree: Hierarchical thoughts generation with tree search for llm optimization modeling,'' arXiv preprint arXiv:2510.22192, 2025

  76. [100]

    Y. Wang, H. Zhou, D. Mao, L. Li, J. Tan, H. Han, Z. Yang, A. J. Wang, and M. Li, ``Or-prm: A process reward model for algorithmic problem in operations research,'' in The Fourteenth International Conference on Learning Representations

  77. [101]

    M. A. Ferrag, N. Tihanyi, and M. Debbah, ``From llm reasoning to autonomous ai agents: A comprehensive review,'' arXiv preprint arXiv:2504.19678, 2025

  78. [102]

    P. Gao, A. Xie, S. Mao, W. Wu, Y. Xia, H. Mi, and F. Wei, ``Meta reasoning for large language models,'' arXiv preprint arXiv:2406.11698, 2024

  79. [103]

    The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey.arXiv preprint arXiv:2404.11584, 2024

    T. Masterman, S. Besen, M. Sawtell, and A. Chao, ``The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey,'' arXiv preprint arXiv:2404.11584, 2024

  80. [104]

    X. Hou, Y. Zhao, S. Wang, and H. Wang, ``Model context protocol (mcp): Landscape, security threats, and future research directions,'' ACM Transactions on Software Engineering and Methodology, 2025

Showing first 80 references.