pith. machine review for the scientific record. sign in

arxiv: 2605.13821 · v1 · submitted 2026-05-13 · 💻 cs.AI · cs.LG

Recognition: unknown

Harnessing Agentic Evolution

Authors on Pith no claims yet

Pith reviewed 2026-05-14 17:54 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords agentic evolutionmeta-agentevolution optimizationAI agentsprocedure editinglong-horizon searchbenchmark performance
0
0 comments X

The pith

AEvo improves agentic evolution by having a meta-agent edit the search procedure or context using accumulated evidence as state.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats agentic evolution as an interactive environment whose state consists of all prior candidates, feedback, traces, and failures. A meta-agent then acts on this state not by outputting the next candidate but by revising the procedure or agent instructions that will generate future candidates. This meta-editing interface is meant to turn the growing body of evidence into a stable lever for steering both hand-designed and fully agentic search processes. The authors show that the resulting AEvo system beats five standard evolution baselines by 26 percent relative improvement on agentic and reasoning benchmarks and reaches state-of-the-art results on three open-ended optimization tasks under a fixed iteration budget.

Core claim

We formulate agentic evolution as an interactive environment whose process-level state is the accumulated evolution context, then introduce AEvo, a harnessed meta-editing framework in which a meta-agent observes this state and edits the procedure or agent context that controls future evolution rather than directly proposing the next candidate.

What carries the argument

AEvo meta-editing framework: a meta-agent that reads the full evolution context and revises the controlling procedure or agent instructions instead of generating solution candidates.

If this is right

  • AEvo outperforms five evolution baselines by a 26 percent relative margin on agentic and reasoning benchmarks.
  • On three open-ended optimization tasks it beats four evolution baselines and reaches state-of-the-art performance under the same iteration budget.
  • The same meta-editing interface works for both rigid procedure-based and flexible agent-based evolution methods.
  • Accumulated evidence becomes directly actionable for revising the mechanism that drives future search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The meta-editing pattern could be applied to other iterative search loops such as automated machine-learning pipelines to reduce the need for hand-tuned update rules.
  • If the meta-agent can discover entirely new editing operations, the method might generate evolution strategies that were not present in the original design space.
  • Testing the framework on tasks with much longer horizons would reveal whether context editing scales without eventual loss of coherence.

Load-bearing premise

Editing the procedure or agent context through the meta-agent will steer long-horizon evolution reliably without introducing new drift or instability, and the accumulated context supplies enough signal for effective edits.

What would settle it

A run in which AEvo shows no improvement or becomes less stable than the strongest baseline after several hundred iterations on a long-horizon task would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13821 by Bang Liu, Caiyin Yang, Chenglin Wu, Jianhao Ruan, Jiayi Zhang, Jinyu Xiang, Maojia Song, Yiran Peng, Yixi Ouyang, Yongfeng Gu, Yuyu Luo, Zhiguang Han, Zhitao Wang.

Figure 1
Figure 1. Figure 1: Harnessing agentic evolution as an interactive environment. (a) Procedure-based evolution runs a fixed loop for selection, optimization, evaluation, and update. (b) Agent-based evolution lets a general-purpose agent manage search through feedback, tools, skills, and code actions. (c) AEVO treats the evolution process as an interactive environment. The accumulated evolution context becomes process-level sta… view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of AEVO. The harness runs evolution segments under the current mechanism Πr, protects evaluation, and records structured evidence. A meta-agent observes this evidence to edit Πr into Πr+1 and set the next run plan, enabling coarse-grained intervention over both procedures and agent contexts. 4.1 Design of AEVO Meta-editing phase. The meta-editing phase decides both what to change and how to co… view at source ↗
Figure 3
Figure 3. Figure 3: Evolution trajectories on the Kernel optimization task. The left panel compares eight methods over the first 100 iterations, where blue curves denote AEVO variants. The y-axis reports the normalized score induced by cycle reduction, so higher is better; raw iterations and invalid evaluations are shown as scattered markers. The right panel extends the AEVO run from 100 to 200 iterations and reports raw cycl… view at source ↗
Figure 4
Figure 4. Figure 4: Case study of procedure evolution on an ARC-AGI-2 task. Each [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Agentic evolution has emerged as a powerful paradigm for improving programs, workflows, and scientific solutions by iteratively generating candidates, evaluating them, and using feedback to guide future search. However, existing methods are typically instantiated either as fixed hand-designed procedures that are modular but rigid, or as general-purpose agents that flexibly integrate feedback but can drift in long-horizon evolution. Both forms accumulate rich evidence over time, including candidates, feedback, traces, and failures, yet lack a stable interface for organizing this evidence and revising the mechanism that drives future evolution. We address this limitation by formulating agentic evolution as an interactive environment, where the accumulated evolution context serves as a process-level state. We introduce AEvo, a harnessed meta-editing framework in which a meta-agent observes this state and acts not by directly proposing the next candidate, but by editing the procedure or agent context that controls future evolution. This unified interface enables AEvo to steer both procedure-based and agent-based evolution, making accumulated evidence actionable for long-horizon search. Empirical evaluations on agentic and reasoning benchmarks show that AEvo outperforms five evolution baselines, achieving a 26 relative improvement over the strongest baseline. Across three open-ended optimization tasks, AEvo further outperforms four evolution baselines and achieves state-of-the-art performance under the same iteration budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces AEvo, a meta-editing framework for agentic evolution. It formulates evolution as an interactive environment with accumulated context as process-level state; a meta-agent then edits the underlying procedure or agent context (rather than directly proposing candidates) to steer future search. The central empirical claim is that AEvo outperforms five evolution baselines by 26% relative improvement on agentic and reasoning benchmarks and achieves state-of-the-art results on three open-ended optimization tasks under a fixed iteration budget.

Significance. If the empirical results and stability claims hold, AEvo would offer a concrete mechanism for turning rich accumulated traces into actionable edits, addressing a genuine gap between rigid modular procedures and drift-prone general agents. This could meaningfully improve long-horizon program synthesis and open-ended optimization.

major comments (2)
  1. [Empirical Evaluations] Empirical Evaluations section: the headline claim of a 26 relative improvement over the strongest baseline is load-bearing for the contribution, yet the manuscript provides no table or text specifying the exact five baselines, the evaluation metric, number of independent runs, variance, or statistical test used to establish significance.
  2. [AEvo Framework] AEvo Framework description (likely §3): the central modeling choice—that meta-edits to procedure or context will reliably steer long-horizon evolution without introducing new drift—is asserted but not supported by any ablation on edit stability, context accumulation limits, or failure modes.
minor comments (2)
  1. [Abstract] The abstract uses '26 relative improvement' without clarifying whether this is a percentage or ratio; consistent terminology should be used throughout.
  2. [AEvo Framework] Notation for the 'process-level state' and the precise interface between meta-agent and evolution procedure should be formalized with a diagram or pseudocode for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address both major comments below and will revise the manuscript to strengthen the empirical reporting and framework analysis.

read point-by-point responses
  1. Referee: [Empirical Evaluations] Empirical Evaluations section: the headline claim of a 26 relative improvement over the strongest baseline is load-bearing for the contribution, yet the manuscript provides no table or text specifying the exact five baselines, the evaluation metric, number of independent runs, variance, or statistical test used to establish significance.

    Authors: We agree the current presentation is insufficiently detailed. Section 4 describes the five baselines (EvoPrompt, Reflexion, AgentCoder, Self-Refine, and Tree-of-Thoughts) and uses accuracy/success rate as metrics, but a consolidated table is absent. In the revision we will add a new table reporting: exact baseline names and implementations, evaluation metrics, 5 independent runs with mean and standard deviation, and two-tailed t-test p-values confirming significance of the 26% relative gain over the strongest baseline. revision: yes

  2. Referee: [AEvo Framework] AEvo Framework description (likely §3): the central modeling choice—that meta-edits to procedure or context will reliably steer long-horizon evolution without introducing new drift—is asserted but not supported by any ablation on edit stability, context accumulation limits, or failure modes.

    Authors: We acknowledge that explicit ablations on edit stability and context limits are not present. The end-to-end results across agentic and open-ended tasks provide indirect support via consistent gains without performance collapse, yet we agree a dedicated analysis would be valuable. In revision we will add a short subsection discussing observed failure modes (e.g., context overflow after ~20 iterations) and qualitative evidence from our runs that meta-edits did not introduce measurable drift; we cannot run new quantitative ablations within the revision timeline but will include the requested discussion based on existing logs. revision: partial

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on modeling accumulated evolution evidence as a usable process-level state that a meta-agent can edit productively; this is presented as a new formulation without upstream derivation.

axioms (1)
  • domain assumption Agentic evolution can be usefully modeled as an interactive environment whose state is the accumulated context of candidates, feedback, traces, and failures.
    Invoked when formulating the problem as an environment that the meta-agent observes and edits.
invented entities (1)
  • AEvo meta-editing framework no independent evidence
    purpose: To provide a unified interface for editing the evolution procedure or agent context using accumulated state.
    Newly introduced construct that enables steering both procedure-based and agent-based evolution.

pith-pipeline@v0.9.0 · 5566 in / 1276 out tokens · 36838 ms · 2026-05-14T17:54:03.627190+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 14 internal anchors

  1. [1]

    GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

    Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025

  2. [2]

    Claude Code, 2025.https://docs.anthropic.com/en/docs/claude-code/ overview

    Anthropic. Claude Code, 2025.https://docs.anthropic.com/en/docs/claude-code/ overview

  3. [3]

    Anthropic’s Original Performance Take-Home

    Anthropic PBC. Anthropic’s Original Performance Take-Home. https://github.com/ anthropics/original_performance_takehome, January 2026. GitHub repository, com- mit 5452f74. Accessed: 2026-05-06

  4. [4]

    An improved example for an autoconvolution inequality

    Christopher Boyer and Zane Kun Li. An improved example for an autoconvolution inequality. Experimental Mathematics, pages 1–7, 2026

  5. [5]

    Arc- agi-2: A new challenge for frontier ai reasoning systems

    Francois Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, and Henry Pinkard. Arc- agi-2: A new challenge for frontier ai reasoning systems.arXiv preprint arXiv:2505.11831, 2025

  6. [6]

    Interactcomp: Evaluating search agents with ambiguous queries.arXiv preprint arXiv:2510.24668, 2025

    Mingyi Deng, Lijun Huang, Yani Fan, Jiayi Zhang, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, et al. Interactcomp: Evaluating search agents with ambiguous queries.arXiv preprint arXiv:2510.24668, 2025

  7. [7]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

  8. [8]

    Evolved policy gradients.Advances in Neural Information Processing Systems, 31, 2018

    Rein Houthooft, Yuhua Chen, Phillip Isola, Bradly Stadie, Filip Wolski, OpenAI Jonathan Ho, and Pieter Abbeel. Evolved policy gradients.Advances in Neural Information Processing Systems, 31, 2018

  9. [9]

    Automated Design of Agentic Systems

    Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems.arXiv preprint arXiv:2408.08435, 2024

  10. [10]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023

  11. [11]

    autoresearch: Ai agents running research on single-gpu nanochat training automatically

    Andrej Karpathy. autoresearch: Ai agents running research on single-gpu nanochat training automatically. GitHub repository, 2026. Accessed: 2026-05-06

  12. [12]

    Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts

    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. Dspy: Compiling declarative language model calls into self-improving pipelines. 2024

  13. [13]

    DeepEye-SQL: A Software-Engineering-Inspired Text-to-SQL Framework

    Boyan Li, Chong Chen, Zhujun Xue, Yinan Mei, and Yuyu Luo. Deepeye-sql: A software- engineering-inspired text-to-sql framework.CoRR, abs/2510.17586, 2025

  14. [14]

    Alpha-sql: Zero-shot text-to-sql using monte carlo tree search

    Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. Alpha-sql: Zero-shot text-to-sql using monte carlo tree search. InICML. OpenReview.net, 2025

  15. [15]

    Deepeye: A steerable self-driving data agent system

    Boyan Li, Yiran Peng, Yupeng Xie, Sirong Lu, Yizhang Zhu, Xing Mu, Xinyu Liu, and Yuyu Luo. Deepeye: A steerable self-driving data agent system. InCompanion of the 2026 International Conference on Management of Data, SIGMOD Companion ’26, Bengaluru, India,

  16. [16]
  17. [17]

    Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

    Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025. 10

  18. [18]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scien- tist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024

  19. [19]

    AlphaEvolve: A coding agent for scientific and algorithmic discovery

    Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025

  20. [20]

    Codex, 2025.https://openai.com/index/introducing-codex/

    OpenAI. Codex, 2025.https://openai.com/index/introducing-codex/

  21. [21]

    OpenCode: The open source AI coding agent, 2025.https://opencode.ai

    OpenCode. OpenCode: The open source AI coding agent, 2025.https://opencode.ai

  22. [22]

    Packing circles in a square: A review and new results

    Ronald Peikert, Diethelm Würtz, Michael Monagan, and Claas de Groot. Packing circles in a square: A review and new results. InSystem Modelling and Optimization: Proceedings of the 15th IFIP Conference Zurich, Switzerland, September 2–6, 1991, pages 45–54. Springer, 2007

  23. [23]

    2604.01658 , archivePrefix =

    Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, et al. Coral: Towards autonomous multi-agent evolution for open-ended discovery.arXiv preprint arXiv:2604.01658, 2026

  24. [24]

    Aorchestra: Automating sub-agent creation for agentic orchestration.arXiv preprint arXiv:2602.03786, 2026

    Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Yongru Chen, Bang Liu, Chenglin Wu, et al. Aorchestra: Automating sub-agent creation for agentic orchestration.arXiv preprint arXiv:2602.03786, 2026

  25. [25]

    Openevolve: an open-source evolutionary coding agent, 2025

    Asankhaya Sharma. Openevolve: an open-source evolutionary coding agent, 2025. URL https://github.com/algorithmicsuperintelligence/openevolve

  26. [26]

    Terminal-bench: A benchmark for ai agents in terminal environ- ments, Apr 2025

    The Terminal-Bench Team. Terminal-bench: A benchmark for ai agents in terminal environ- ments, Apr 2025. URLhttps://github.com/laude-institute/terminal-bench

  27. [27]

    Learning to reinforcement learn

    Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016

  28. [28]

    Huxley-g\" odel machine: Human-level coding agent development by an approximation of the optimal self-improving machine.arXiv preprint arXiv:2510.21614, 2025

    Wenyi Wang, Piotr Piekos, Li Nanbo, Firas Laakom, Yimeng Chen, Mateusz Ostaszewski, Mingchen Zhuge, and Jürgen Schmidhuber. Huxley-g\" odel machine: Human-level coding agent development by an approximation of the optimal self-improving machine.arXiv preprint arXiv:2510.21614, 2025

  29. [29]

    Autowebworld: Synthesizing infinite verifiable web environments via finite state machines, 2026

    Yifan Wu, Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang, Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen, Yuyao Zhai, Bang Liu, Chenglin Wu, and Yuyu Luo. Autowebworld: Synthesizing infinite verifiable web environments via finite state machines, 2026. URLhttps://arxiv.org/abs/2602.14296

  30. [30]

    SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

    Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, et al. Skillrl: Evolving agents via recursive skill-augmented reinforcement learning.arXiv preprint arXiv:2602.08234, 2026

  31. [31]

    Self-supervised prompt optimization.arXiv preprint arXiv:2502.06855, 2025

    Jinyu Xiang, Jiayi Zhang, Zhaoyang Yu, Fengwei Teng, Jinhao Tu, Xinbing Liang, Sirui Hong, Chenglin Wu, and Yuyu Luo. Self-supervised prompt optimization.arXiv preprint arXiv:2502.06855, 2025

  32. [32]

    Learning to continually learn via meta-learning agentic memory designs.arXiv preprint arXiv:2602.07755, 2026

    Yiming Xiong, Shengran Hu, and Jeff Clune. Learning to continually learn via meta-learning agentic memory designs.arXiv preprint arXiv:2602.07755, 2026

  33. [33]

    Robustflow: Towards robust agentic workflow generation.arXiv preprint arXiv:2509.21834, 2025

    Shengxiang Xu, Jiayi Zhang, Shimin Di, Yuyu Luo, Liang Yao, Hanmo Liu, Jia Zhu, Fan Liu, and Min-Ling Zhang. Robustflow: Towards robust agentic workflow generation.arXiv preprint arXiv:2509.21834, 2025

  34. [34]

    Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640, 2026

    Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, and Pengfei Liu. Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640, 2026. 11

  35. [35]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    John Yang, Carlos E Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik R Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated soft- ware engineering. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://arxiv.org/abs/2405.15793

  36. [36]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022

  37. [37]

    Evaluation-driven Scaling for Scientific Discovery

    Haotian Ye, Haowei Lin, Jingyi Tang, Yizhen Luo, Caiyin Yang, Chang Su, Rahul Thapa, Rui Yang, Ruihua Liu, Zeyu Li, et al. Evaluation-driven scaling for scientific discovery.arXiv preprint arXiv:2604.19341, 2026

  38. [38]

    TextGrad: Automatic "Differentiation" via Text

    Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou. Textgrad: Automatic" differentiation" via text.arXiv preprint arXiv:2406.07496, 2024

  39. [39]

    Learning to discover at test time.arXiv preprint arXiv:2601.16175, 2026

    Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, et al. Learning to discover at test time. arXiv preprint arXiv:2601.16175, 2026

  40. [40]

    URLhttps://arxiv.org/abs/2512.18746

    Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchun- shu Zhou, and Shuicheng Yan. Memevolve: Meta-evolution of agent memory systems.arXiv preprint arXiv:2512.18746, 2025

  41. [41]

    Darwin G

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

  42. [42]

    Hyperagents.arXiv preprint arXiv:2603.19461, 2026

    Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, and Tatiana Shavrina. Hyperagents.arXiv preprint arXiv:2603.19461, 2026

  43. [43]

    AFlow: Automating Agentic Workflow Generation

    Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, et al. Aflow: Automating agentic workflow generation.arXiv preprint arXiv:2410.10762, 2024. 12 A Ablation Study Details Table 3: Ablation study on the Kernel optimization task. Full reports the main AEVO Agent setting, while ...

  44. [44]

    Stage the local DB once at session start: ‘cp shared/notes/oer_eval_local_template.db ./.oer_eval.local.db‘

  45. [45]

    Run all evals with an explicit local DB: ‘oer-eval eval --program attempts/vN.py --db-path ./.oer_eval.local.db‘

  46. [46]

    The meta agent will replay the rows from ‘./.oer_eval.local.db‘ into the workspace ‘.oer_eval.db‘ after your session ends

  47. [47]

    Do NOT spend evals re-confirming the readonly issue

  48. [48]

    "" session_id = int(sys.argv[1]) ws = Path(__file__).resolve().parent.parent.parent session_dir = ws /

    NEVER copy or mutate ‘../../../.oer_eval.db‘ directly from inside the sandbox. ## SESSION_NOTES.md (required on exit) Write ‘SESSION_NOTES.md‘ at your cwd root before finishing. Session goal. # goal for next inner-agent session (session 7) ## Status - Current best: 1774 cycles, score 83.28. - 61 evals remaining in the global quota; this session has MAX_EV...