pith. machine review for the scientific record. sign in

arxiv: 2604.14655 · v2 · submitted 2026-04-16 · 💻 cs.AI · cs.LG

Recognition: no theorem link

AgentGA: Evolving Code Solutions in Agent-Seed Space

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:08 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords agent seed optimizationgenetic algorithmautonomous code generationparent archive inheritancetabular AutoMLKaggle benchmarklong-horizon agentselite tournament selection
0
0 comments X

The pith

Optimizing the agent seed—starting prompt plus inherited parent archives—lets a genetic algorithm evolve superior autonomous code-generation runs for tabular AutoML.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether searching over reusable starting conditions for agent runs, rather than editing generated code, improves outcomes on real benchmarks. AgentGA runs a population-level genetic algorithm that launches fresh agent executions each generation while passing selected archives forward so descendants can inspect and reuse prior artifacts. On the 16-competition Weco-Kaggle Lite suite this yields an average 71.90 percent exceed-human rate and wins 15 competitions outright, with archive-conditioned descendants beating de-novo proposals in 51.9 percent of direct tournaments. The results indicate that treating the initial seed as the evolvable unit is a workable design choice when agents already perform long-horizon autonomous search.

Core claim

AgentGA couples a genetic algorithm with long-horizon agents by evolving the agent seed—the task prompt together with optional parent archives that initialize each fresh workspace—rather than mutating code directly. Selection proceeds via deterministic 1:1 elite tournaments, operator allocation adapts online via a modified Hedge controller, and each generation launches an isolated autonomous run. On the full 16-competition benchmark the method records a 71.90 percent average exceed-human rate against 51.38 percent for the AIDE reference and wins 15 of 16 tasks; within runs, archive-inheriting descendants prevail in 51.9 percent of 1,680 parent-child tournaments while de-novo proposals win 8.

What carries the argument

The agent seed: the task prompt plus optional parent archives that initialize a fresh workspace, which serves as the evolvable unit allowing inheritance of artifacts across generations without direct code editing.

If this is right

  • Descendants that inherit parent archives outperform fresh proposals in the majority of direct comparisons.
  • A population-level genetic algorithm with elite tournaments and online operator allocation can be applied to long-horizon autonomous agents.
  • Agent-seed optimization constitutes a practical design choice for autonomous code-search systems on tabular AutoML tasks.
  • The approach wins 15 of 16 competitions on the Weco-Kaggle Lite benchmark at an average 71.90 percent exceed-human rate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same seed-evolution pattern could be tested on domains other than tabular data, such as image or text generation pipelines, to check whether workspace inheritance transfers.
  • If archive reuse proves robust, future systems might reduce population size by focusing compute on promising inherited lineages rather than broad random restarts.
  • Separating prompt evolution from workspace-state evolution would clarify which component drives most of the observed tournament advantage.
  • The method suggests that agent frameworks could benefit from explicit mechanisms to log and replay successful initialization states across independent runs.

Load-bearing premise

The reported performance gains arise specifically from optimizing and inheriting agent seeds rather than from unstated details of the underlying agent implementation, benchmark tuning, or selective reporting of runs.

What would settle it

A controlled experiment that keeps the same agent implementation and total compute budget but disables seed optimization and inheritance, then measures whether the exceed-human rate falls back to the 51.38 percent reference level.

Figures

Figures reproduced from arXiv: 2604.14655 by David Y.Y. Tan, Jingxian Zhang, Kellie Chin.

Figure 1
Figure 1. Figure 1: Two-layer architecture. Evolution assigns a task type and parent archives to each fresh [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tournament outcomes by task across these benchmark runs. Each bar shows the share [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Best-score progression over the tabular-playground-series-aug-2022 run. [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Parent-child structure for the tabular-playground-series-aug-2022 run, showing explicit [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
read the original abstract

We present AgentGA, a framework that evolves autonomous code-generation runs by optimizing the agent seed: the task prompt plus optional parent archives that initialize a fresh workspace. The outer loop searches over these reusable starting conditions rather than editing code directly. Each generation launches a fresh autonomous run in an isolated workspace, while selected parent archives provide inherited artifacts that descendants can inspect and reuse. AgentGA couples a population-level genetic algorithm with long-horizon agents; selection uses deterministic 1:1 elite tournaments and operator allocation is adapted online with a modified Hedge controller. We instantiate the approach for tabular AutoML on the 16-competition Weco-Kaggle Lite benchmark. Across the full benchmark, AgentGA averages 71.90% Exceeds % of Human versus 51.38% for the AIDE reference, winning 15/16 competitions. Within AgentGA runs, descendants conditioned on inherited parent archives win 51.9% of 1,680 parent-child tournaments versus 8.6% for de novo proposals. These results support agent-seed optimization as a practical design choice for autonomous code-search systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper presents AgentGA, a framework that evolves autonomous code-generation agents by optimizing the agent seed (task prompt plus optional parent archives) via an outer genetic algorithm loop rather than direct code edits. Each generation runs a fresh long-horizon agent in an isolated workspace, with selection via deterministic 1:1 elite tournaments and online Hedge-based operator adaptation. On the 16-competition Weco-Kaggle Lite tabular AutoML benchmark, AgentGA achieves 71.90% average Exceeds % of Human versus 51.38% for the AIDE reference (winning 15/16 competitions); within AgentGA, inherited-archive descendants win 51.9% of 1,680 parent-child tournaments versus 8.6% for de novo proposals.

Significance. If the base-agent equivalence and experimental controls hold, the work provides evidence that optimizing reusable starting conditions can improve performance in long-horizon code-search systems, offering a practical alternative or complement to direct code mutation. The internal tournament design supplies a within-method control that partially isolates the inheritance effect, and the deterministic selection plus adaptive operators are clear methodological strengths.

major comments (3)
  1. [Abstract] Abstract: The central claim that performance gains arise from agent-seed optimization requires that the underlying long-horizon agent, workspace isolation, and benchmark execution are identical between AgentGA and AIDE except for the outer GA loop and inheritance mechanism. No such confirmation or base-agent description is supplied, so the 71.90% vs 51.38% comparison cannot be attributed specifically to seed evolution.
  2. [Abstract] Abstract: The headline results (71.90% Exceeds % of Human, 15/16 wins) are stated without any experimental protocol, run-selection criteria, statistical tests, or description of how runs were chosen or excluded. This absence makes the benchmark outcomes impossible to assess for support of the central claim.
  3. [Abstract] Abstract: The internal 51.9% vs 8.6% tournament statistic is computed only within AgentGA runs and therefore does not address potential confounds in the cross-method comparison to AIDE.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief sentence clarifying the precise definition of 'Exceeds % of Human' and the construction of the Weco-Kaggle Lite benchmark.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for clearer experimental controls and protocol details. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that performance gains arise from agent-seed optimization requires that the underlying long-horizon agent, workspace isolation, and benchmark execution are identical between AgentGA and AIDE except for the outer GA loop and inheritance mechanism. No such confirmation or base-agent description is supplied, so the 71.90% vs 51.38% comparison cannot be attributed specifically to seed evolution.

    Authors: We agree that explicit confirmation of base-agent equivalence is required to support attribution of gains to seed optimization. The manuscript positions AgentGA as an outer loop around the same long-horizon autonomous code-generation agent and isolated workspace used by the AIDE reference, with the sole additions being the genetic algorithm over seeds and parent-archive inheritance. To make this equivalence unambiguous, we will add a dedicated subsection in the Methods section that describes the base agent configuration, workspace isolation protocol, and benchmark execution harness, together with an explicit statement that these elements are held fixed between the two methods. revision: yes

  2. Referee: [Abstract] Abstract: The headline results (71.90% Exceeds % of Human, 15/16 wins) are stated without any experimental protocol, run-selection criteria, statistical tests, or description of how runs were chosen or excluded. This absence makes the benchmark outcomes impossible to assess for support of the central claim.

    Authors: We acknowledge that the current manuscript does not supply a full experimental protocol, run-selection criteria, or statistical analysis in the abstract or main text. The reported figures reflect single deterministic runs per competition for each method. We will insert a new Experimental Setup section that details the run protocol, any exclusion rules, computational budget, and the rationale for omitting formal statistical tests (driven by the high cost of long-horizon agent executions). This addition will allow readers to evaluate the strength of the benchmark comparison directly. revision: yes

  3. Referee: [Abstract] Abstract: The internal 51.9% vs 8.6% tournament statistic is computed only within AgentGA runs and therefore does not address potential confounds in the cross-method comparison to AIDE.

    Authors: The referee correctly notes that the parent-child tournament results are computed exclusively inside AgentGA runs and therefore cannot serve as a control for confounds between AgentGA and AIDE. This internal statistic is intended only to isolate the contribution of archive inheritance within our own method. The primary cross-method evidence rests on the assumption of identical base agents and benchmarks, which we will make explicit in the revised Methods section as described in our response to the first comment. We will also revise the text to state the limited scope of the tournament analysis. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark comparisons with no derivations or self-referential chains

full rationale

The paper contains no equations, derivations, fitted parameters, or mathematical claims. All results are direct empirical measurements (win rates, tournament outcomes) on the Weco-Kaggle Lite benchmark against an external reference (AIDE). No self-citations, ansatzes, uniqueness theorems, or renamings of known results appear in the provided text. The central claim is an observed performance difference, not a reduction of any output to its inputs by construction. This is the normal case for an empirical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities beyond the high-level framing of agent-seed space; all details are absent.

pith-pipeline@v0.9.0 · 5494 in / 1002 out tokens · 135314 ms · 2026-05-12T04:08:02.870567+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 6 internal anchors

  1. [1]

    SELA : Tree - Search Enhanced LLM Agents for Automated Machine Learning , October 2024

    Yizhou Chi, Yizhang Lin, Sirui Hong, Duyi Pan, Yaying Fei, Guanghao Mei, Bangbang Liu, Tianqi Pang, Jacky Kwok, Ceyao Zhang, Bang Liu, and Chenglin Wu. SELA : Tree - Search Enhanced LLM Agents for Automated Machine Learning , October 2024. URL http://arxiv.org/abs/2410.17238. arXiv:2410.17238 [cs]

  2. [2]

    AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. AutoGluon - Tabular : Robust and Accurate AutoML for Structured Data , March 2020. URL http://arxiv.org/abs/2003.06505. arXiv:2003.06505 [stat]

  3. [3]

    MLZero : A Multi - Agent System for End -to-end Machine Learning Automation

    Haoyang Fang, Boran Han, Nick Erickson, Xiyuan Zhang, Su Zhou, Anirudh Dagar, Jiani Zhang, Ali Caner Turkmen, Cuixiong Hu, Huzefa Rangwala, Ying Nian Wu, Bernie Wang, and George Karypis. MLZero : A Multi - Agent System for End -to-end Machine Learning Automation . In Advances in Neural Information Processing Systems 38 ( NeurIPS ) , 2025. doi:10.48550/arX...

  4. [4]

    Efficient and Robust Automated Machine Learning

    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. Efficient and Robust Automated Machine Learning . In Advances in Neural Information Processing Systems , volume 28. Curran Associates, Inc., 2015. URL https://papers.nips.cc/paper/2015/hash/11d0e6287202fced83f79975ec59a3a6-Abstract.html

  5. [5]

    Auto- Sklearn 2.0: Hands -free AutoML via Meta - Learning

    Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. Auto- Sklearn 2.0: Hands -free AutoML via Meta - Learning . Journal of Machine Learning Research, 23 0 (261): 0 1--61, 2022. ISSN 1533-7928. URL http://jmlr.org/papers/v23/21-0992.html

  6. [6]

    Schapire

    Yoav Freund and Robert E Schapire. A Decision - Theoretic Generalization of On - Line Learning and an Application to Boosting . Journal of Computer and System Sciences, 55 0 (1): 0 119--139, August 1997. ISSN 0022-0000. doi:10.1006/jcss.1997.1504. URL https://www.sciencedirect.com/science/article/pii/S002200009791504X

  7. [7]

    Antoine Grosnit, Alexandre Maraval, Refinath S. N, Zichao Zhao, James Doran, Giuseppe Paolo, Albert Thomas, Jonas Gonzalez, Abhineet Kumar, Khyati Khandelwal, Abdelhakim Benechehab, Hamza Cherkaoui, Youssef Attia El-Hili, Kun Shao, Jianye Hao, Jun Yao, Balázs Kégl, Haitham Bou-Ammar, and Jun Wang. Kolb- Based Experiential Learning for Generalist Agents wi...

  8. [8]

    Connecting large language models with evolutionary algorithms yields powerful prompt optimizers.arXiv preprint arXiv:2309.08532,

    Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. EvoPrompt : Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers . In Proceedings of the 12th International Conference on Learning Representations ( ICLR ) , 2024 a . doi:10.48550/arXiv.2309.08532. URL http://arxiv.org/abs/2...

  9. [9]

    DS - Agent : Automated Data Science by Empowering Large Language Models with Case - Based Reasoning

    Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. DS - Agent : Automated Data Science by Empowering Large Language Models with Case - Based Reasoning . In Proceedings of the 41st International Conference on Machine Learning ( ICML ) , pages 16813--16848, 2024 b . doi:10.48550/arXiv.2402.17453. URL http://arxiv.org/abs/2402.17453

  10. [10]

    Large Language Models for Automated Data Science : Introducing CAAFE for Context - Aware Automated Feature Engineering

    Noah Hollmann, Samuel Müller, and Frank Hutter. Large Language Models for Automated Data Science : Introducing CAAFE for Context - Aware Automated Feature Engineering . In Advances in Neural Information Processing Systems 36 ( NeurIPS ) , 2023. doi:10.48550/arXiv.2305.03403. URL http://arxiv.org/abs/2305.03403

  11. [11]

    Data interpreter: An llm agent for data science,

    Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Chenxing Wei, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zhibin Gou, Zongze Xu, and Chenglin Wu. Data Interprete...

  12. [12]

    Automated Design of Agentic Systems

    Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems, 2025. URL https://arxiv.org/abs/2408.08435

  13. [13]

    Aide: Ai-driven exploration in the space of code,

    Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, and Yuxiang Wu. AIDE : AI - Driven Exploration in the Space of Code , February 2025. URL http://arxiv.org/abs/2502.13138. arXiv:2502.13138 [cs]

  14. [14]

    and Sutherland Robson, Esme and Kohli, Pushmeet and de Freitas, Nando and Kavukcuoglu, Koray and Vinyals, Oriol , year=

    Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

  15. [15]

    AutoKaggle : A Multi - Agent Framework for Autonomous Data Science Competitions

    Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, and Ge Zhang. AutoKaggle : A Multi - Agent Framework for Autonomous Data Science Competitions . In Proceedings of the 13th International Conference on Learning Representations ( ICLR ) , 2025. ...

  16. [16]

    I- MCTS : Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search , February 2025

    Zujie Liang, Feng Wei, Wujiang Xu, Lin Chen, Yuxi Qian, and Xinhui Wu. I- MCTS : Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search , February 2025. URL http://arxiv.org/abs/2502.14693. arXiv:2502.14693 [cs]

  17. [17]

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, and 1 others

    Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, and Huacan Wang. Se-agent: Self-evolution trajectory optimization in multi-step reasoning with llm-based agents, 2025. URL https://arxiv.org/abs/2508.02085

  18. [18]

    Alexander Novikov, Ngân Vũ, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. AlphaEvolve : A coding agent for scientific and algor...

  19. [19]

    Olson, Nathan Bartley, Ryan J

    Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. Evaluation of a Tree -based Pipeline Optimization Tool for Automating Data Science . In Proceedings of the Genetic and Evolutionary Computation Conference 2016 , GECCO '16, pages 485--492, New York, NY, USA, July 2016. Association for Computing Machinery. ISBN 978-1-4503-4206-3. doi:...

  20. [20]

    , author Barekatain, M

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, January 2024. ISSN 1476-468...

  21. [21]

    Agentsquare: Automatic llm agent search in modular design space, 2025

    Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, and Yong Li. Agentsquare: Automatic llm agent search in modular design space, 2025. URL https://arxiv.org/abs/2410.06153

  22. [22]

    Kimi K2.5: Visual Agentic Intelligence

    Kimi Team. Kimi k2.5: Visual agentic intelligence, 2026. URL https://arxiv.org/abs/2602.02276

  23. [23]

    Hoos, and Kevin Leyton-Brown

    Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto- WEKA : combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '13, pages 847--855, New York, NY, USA, August 2013. Association for Computing Machin...

  24. [24]

    arXiv preprint arXiv:2410.02958 , year=

    Patara Trirat, Wonyong Jeong, and Sung Ju Hwang. AutoML - Agent : A Multi - Agent LLM Framework for Full - Pipeline AutoML . In Proceedings of the 42nd International Conference on Machine Learning ( ICML ) , 2025. doi:10.48550/arXiv.2410.02958. URL http://arxiv.org/abs/2410.02958

  25. [25]

    Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing, 2026

    Zhaotian Weng, Antonis Antoniades, Deepak Nathani, Zhen Zhang, Xiao Pu, and Xin Eric Wang. Group-evolving agents: Open-ended self-improvement via experience sharing, 2026. URL https://arxiv.org/abs/2602.04837

  26. [26]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct : Synergizing Reasoning and Acting in Language Models . In Proceedings of the 11th International Conference on Learning Representations ( ICLR ) , 2023. URL http://arxiv.org/abs/2210.03629

  27. [27]

    ReEvo: Large language models as hyper-heuristics with reflective evolution.arXiv preprint arXiv:2402.01145, 2024

    Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. ReEvo : Large Language Models as Hyper - Heuristics with Reflective Evolution . In Advances in Neural Information Processing Systems 37 ( NeurIPS ) , 2024. doi:10.48550/arXiv.2402.01145. URL http://arxiv.org/abs/2402.01145

  28. [28]

    arXiv preprint arXiv:2502.07373 , year=

    Guibin Zhang, Kaijie Chen, Guancheng Wan, Heng Chang, Hong Cheng, Kun Wang, Shuyue Hu, and Lei Bai. Evoflow: Evolving diverse agentic workflows on the fly, 2025 a . URL https://arxiv.org/abs/2502.07373

  29. [29]

    Darwin G

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin godel machine: Open-ended evolution of self-improving agents, 2026. URL https://arxiv.org/abs/2505.22954

  30. [30]

    AFlow: Automating Agentic Workflow Generation

    Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. Aflow: Automating agentic workflow generation, 2025 b . URL https://arxiv.org/abs/2410.10762

  31. [31]

    MLCopilot : Unleashing the Power of Large Language Models in Solving Machine Learning Tasks

    Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, and Yuqing Yang. MLCopilot : Unleashing the Power of Large Language Models in Solving Machine Learning Tasks . In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics ( EACL ) , pages 2931--2959, 2024. doi:10.48550/arXiv.2304.14979. URL http://arxiv.org/ab...