arxiv: 2604.15082 · v1 · submitted 2026-04-16 · 💻 cs.AR · cs.AI

Recognition: unknown

Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC

Cunxi Yu, Haoxing Ren

Pith reviewed 2026-05-10 09:48 UTC · model grok-4.3

classification 💻 cs.AR cs.AI

keywords logic synthesisABC toolLLM agentsself-evolving systemsEDA optimizationautonomous code improvementquality of results

0 comments

The pith

LLM agents can autonomously rewrite sections of the full ABC logic synthesis codebase and discover new strategies that improve quality of results on standard benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework in which multiple LLM agents iteratively propose and apply code changes to the entire ABC logic synthesis system. Agents follow structured prompts to target sub-components such as logic minimization or technology mapping, then compile the modified code, verify functional correctness, and measure quality-of-results gains across ISCAS, VTR, EPFL, and IWLS benchmark suites. The process repeats in cycles that start from existing open-source components and build improvements without manual injection of new heuristics. A sympathetic reader cares because the approach claims to let a large, integrated EDA tool improve itself at million-line scale while keeping its original single-binary interface intact.

Core claim

The central claim is that a team of LLM-based agents, operating under programming guidance prompts and a unified correctness-plus-QoR evaluation loop, can progressively rewrite and evolve specific sub-components of the ABC codebase. Each cycle generates modifications, rebuilds the integrated binary, validates behavior, and scores results on multi-suite benchmarks. Through this closed feedback process the system identifies optimizations beyond human-designed heuristics and learns new synthesis strategies.

What carries the argument

The multi-agent self-evolution loop that applies prompt-driven code rewrites to ABC sub-components, followed by compilation, correctness validation, and QoR measurement on benchmark suites.

If this is right

The evolved ABC binary produces higher quality-of-results than the starting version on ISCAS 85/89/99, VTR, EPFL, and IWLS 2005 suites.
New synthesis strategies emerge that were not present in the human-designed heuristics used at bootstrap.
The framework can continue through multiple evolution cycles while preserving ABC's single-binary execution model and command interface.
The same agent-driven rewrite process can be applied to other large open-source EDA components without changing their external interfaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the method scales to million-line codebases, similar agent teams could be pointed at other long-lived synthesis or verification tools to reduce manual maintenance effort.
Success would imply that prompt-based agents can discover domain-specific optimizations that human engineers have overlooked in complex software.
A practical next test would be to freeze the evolved code and evaluate it on entirely new benchmark families to check whether improvements generalize beyond the training loop.
Over longer horizons the same loop could be run continuously as new hardware targets appear, letting the tool adapt without human intervention.

Load-bearing premise

LLM agents guided only by prompts and an evaluation loop can reliably produce functionally correct code changes that deliver genuine QoR gains on the full ABC codebase without introducing subtle bugs or overfitting to the benchmarks.

What would settle it

Take the final evolved ABC binary and run it on a fresh collection of circuits drawn from a different source than the evolution benchmarks; if the reported QoR gains disappear or if the binary produces incorrect outputs on edge cases, the central claim fails.

Figures

Figures reproduced from arXiv: 2604.15082 by Cunxi Yu, Haoxing Ren.

**Figure 1.** Figure 1: The first agent, responsible for optimization flow evolution, works within the flow-scheduling and pass-orchestration layer, primarily interacting with the FlowTune-integrated module located under src/opt/flowtune/. Its role is to evolve passselection heuristics, stopping criteria, and conditional flow steps, ensuring that any modifications are local to the FlowTune module to avoid interference with cor… view at source ↗

**Figure 2.** Figure 2: Automatically generated abcFlowTune7.c module partial code produced during evolution (cycle 7). exposure to heterogeneous external research repositories during initialization, the agents overwhelmingly converge to the native ABC coding style when generating new C code. The patches they produce closely match ABC’s formatting, naming conventions, commenting structure, and macro organization with striking fi… view at source ↗

read the original abstract

This paper introduces the first \emph{self-evolving} logic synthesis framework, which leverages Large Language Model (LLM) agents to autonomously improve the source code of \textsc{ABC}, the widely adopted logic synthesis system. Our framework operates on the \emph{entire integrated ABC codebase}, and the output repository preserves its single-binary execution model and command interface. In the initial evolution cycle, we bootstrap the system using existing prior open-source synthesis components, covering flow tuning, logic minimization, and technology mapping, but without manually injecting new heuristics. On top of this foundation, a team of LLM-based agents iteratively rewrites and evolves specific sub-components of ABC following our ``programming guidance`` prompts under a unified correctness and QoR-driven evaluation loop. Each evolution cycle proposes code modifications, compiles the integrated binary, validates correctness, and evaluates quality-of-results (QoR) on \emph{multi-suite benchmarks including ISCAS~85/89/99, VTR, EPFL, and IWLS~2005}. Through continuous feedback, the system discovers optimizations beyond human-designed heuristics, effectively \emph{learning new synthesis strategies} that enhance QoR. We detail the architecture of this self-improving system, its integration with \textsc{ABC}, and results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a multi-agent LLM framework that autonomously evolves the full ABC logic synthesis codebase (preserving its single-binary interface) by iteratively rewriting sub-components via prompt-guided modifications. It bootstraps from prior open-source components and uses a unified loop of compilation, correctness validation, and QoR evaluation on ISCAS 85/89/99, VTR, EPFL, and IWLS 2005 suites, claiming to discover new synthesis strategies that progressively improve QoR beyond human-designed heuristics at million-line scale.

Significance. If the central claim holds with verifiable evidence, the work would be significant as the first demonstration of self-evolving EDA tools at the scale of a production synthesis system, potentially reducing reliance on manual heuristic tuning and enabling continuous optimization. The preservation of ABC's command interface and use of public multi-suite benchmarks are practical strengths that support reproducibility.

major comments (2)

[Abstract] Abstract: the manuscript states that the system 'discovers optimizations beyond human-designed heuristics' and supplies 'results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale,' yet provides no quantitative QoR deltas, number of evolution cycles, specific code changes, or comparisons against the baseline ABC binary; this absence is load-bearing for the central claim.
[Evaluation Loop] Evaluation description: the correctness-plus-QoR loop relies exclusively on compilation and execution against the listed benchmark suites without reported safeguards such as differential testing on unseen netlists, formal equivalence checking of modified passes, or coverage metrics on the evolved components; given the complexity of ABC's interacting C/C++ heuristics, this leaves open the possibility that reported gains reflect overfitting or undetected regressions rather than robust improvements.

minor comments (1)

[Abstract] The abstract and text use 'programming guidance' in quotes without defining the exact prompt templates or agent roles, which would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where revisions have been made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript states that the system 'discovers optimizations beyond human-designed heuristics' and supplies 'results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale,' yet provides no quantitative QoR deltas, number of evolution cycles, specific code changes, or comparisons against the baseline ABC binary; this absence is load-bearing for the central claim.

Authors: We agree that the abstract would benefit from more explicit references to the quantitative results. In the revised manuscript, we have updated the abstract to reference the specific QoR improvements, evolution cycle counts, and baseline comparisons that are presented in the evaluation sections and figures. This provides readers with a clearer high-level indication of the evidence while preserving conciseness; the detailed deltas, cycle information, and code change descriptions remain in the body of the paper. revision: yes
Referee: [Evaluation Loop] Evaluation description: the correctness-plus-QoR loop relies exclusively on compilation and execution against the listed benchmark suites without reported safeguards such as differential testing on unseen netlists, formal equivalence checking of modified passes, or coverage metrics on the evolved components; given the complexity of ABC's interacting C/C++ heuristics, this leaves open the possibility that reported gains reflect overfitting or undetected regressions rather than robust improvements.

Authors: The referee correctly notes that our evaluation description could be strengthened with additional safeguards. In the revision, we have expanded the evaluation section to discuss the use of multiple independent benchmark suites as a primary mitigation against overfitting and to report coverage metrics for the evolved components. We also include results from differential testing on a held-out subset of netlists. Formal equivalence checking was applied selectively to critical passes using ABC's verification commands, but exhaustive application across all heuristic interactions is not feasible at this scale; we now explicitly acknowledge this limitation and its implications for robustness. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical self-evolution evaluated on external benchmarks

full rationale

The paper describes an empirical system in which LLM agents iteratively modify the ABC source code under a correctness-plus-QoR loop and measure results on independent public benchmark suites (ISCAS 85/89/99, VTR, EPFL, IWLS 2005). No equations, fitted parameters, or self-referential metrics appear; the claimed QoR gains are not shown to reduce by construction to the initial bootstrap components or to any internal definition. The central claim rests on observable compilation success and benchmark execution rather than on a derivation chain that imports its own outputs. Self-citations, if present, are not load-bearing for the reported improvements, and the evaluation protocol uses externally defined suites that remain fixed across evolution cycles.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven capability of LLM agents to produce correct and improving code edits at million-line scale using only prompt-based guidance and benchmark feedback; no free parameters or invented entities are stated, but the domain assumption about reliable LLM code generation is load-bearing.

axioms (1)

domain assumption LLM agents can generate functionally correct and QoR-improving modifications to a complex C++ codebase such as ABC when guided by programming prompts and an automated evaluation loop
This assumption underpins the entire iterative evolution process described in the abstract.

pith-pipeline@v0.9.0 · 5533 in / 1263 out tokens · 65396 ms · 2026-05-10T09:48:37.915696+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The EPFL combinational benchmark suite.Hypotenuse256, 128 (2015), 214335

2015
[2]

Satrajit Chatterjee, Robert K Brayton, and Alan Mishchenko. 2006. On resubsti- tution in logic synthesis. InICCAD. 144–149

2006
[3]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, and et al. 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Scott Davidson. 1999. ITC’99 benchmark circuits-preliminary results. InInterna- tional Test Conference 1999. Proceedings (IEEE Cat. No. 99CH37034). IEEE Computer Society, 1125–1125

1999
[5]

Longfei Fan and Chang Wu. 2023. FPGA technology mapping with adaptive gate decomposition. InProc. ACM/SIGDA Int’l Symposium on Field-Programmable Gate Arrays (FPGA). 135–140

2023
[6]

Amur Ghose, Andrew B Kahng, Sayak Kundu, and Bodhisatta Pramanik. 2026. Agentic AI for Physical Design R&D: Status and Prospects. InProceedings of the 2026 International Symposium on Physical Design. 133–141

2026
[7]

Yujia Li, David Choi, Junyoung Chung, and et al. 2022. Competition-Level Code Generation with AlphaCode.Science378, 6624 (2022), 1092–1100

2022
[8]

Yingjie Li, Mingju Liu, Haoxing Ren, Alan Mishchenko, and Cunxi Yu. 2024. Dag-aware synthesis orchestration.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems43, 12 (2024), 4666–4675

2024
[9]

Junfeng Liu, Liwei Ni, Xingquan Li, Min Zhou, Lei Chen, Xing Li, Qinghua Zhao, and Shuai Ma. 2023. Aimap: Learning to improve technology mapping for asics via delay prediction. In2023 IEEE 41st International Conference on Computer Design (ICCD). IEEE, 344–347

2023
[10]

Mingju Liu, Daniel Robinson, Yingjie Li, Johannes Maximilian Kuehn, Rongjian Liang, Haoxing Ren, and Cunxi Yu. 2026. Maptune: Versatile asic technology mapping via reinforcement learning guided library tuning.ACM Transactions on Design Automation of Electronic Systems31, 4 (2026), 1–21

2026
[11]

Mingju Liu, Daniel Robinson, Yingjie Li, and Cunxi Yu. 2024. MapTune: Ad- vancing ASIC Technology Mapping via Reinforcement Learning Guided Library Tuning.arXiv preprint arXiv:2407.18110(2024)

work page arXiv 2024
[12]

Alan Mishchenko et al . 2007. ABC: A System for Sequential Synthesis and Verification. https://people.eecs.berkeley.edu/~alanmi/abc/

2007
[13]

Alan Mishchenko and Robert Brayton. 2011. Scalable logic rewriting using don’t-cares. InDATE. 1432–1437

2011
[14]

Brayton, and et al

Alan Mishchenko, Robert K. Brayton, and et al. 2010. ABC: A System for Sequen- tial Synthesis and Verification. http://www.eecs.berkeley.edu/~alanmi/abc

2010
[15]

Alan Mishchenko, Satrajit Chatterjee, and Robert K. Brayton. 2006. DAG-aware AIG rewriting: a fresh look at combinational logic synthesis. InProc. 43rd Design Automation Conference (DAC). 532–535

2006
[16]

Alan Mishchenko, Satrajit Chatterjee, and Robert K Brayton. 2006. DAG-aware AIG rewriting: a fresh look at combinational logic synthesis. InDAC. 532–535

2006
[17]

Alan Mishchenko, Sungmin Cho, Satrajit Chatterjee, and Robert K. Brayton
[18]

Combinational and sequential mapping with priority cuts. InProc. Int’l Conference on Computer-Aided Design (ICCAD). 354–361
[19]

Kevin E Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, Mohamed Eldafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G Graham, Jean Wu, Matthew JP Walker, et al. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling.ACM Transactions on Reconfigurable Technology and Systems (TRETS)13, 2 (2020), 1–55

2020
[20]

Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu

Walter L. Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu. 2023. FlowTune: End-to-End Automatic Logic Optimization Exploration via Domain- Specific Multiarmed Bandit.IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems42, 6 (2023), 1912–1925

2023
[21]

Walter Lau Neto, Matheus T Moreira, Yingjie Li, Luca Amarù, Cunxi Yu, and Pierre-Emmanuel Gaillardon. 2021. SLAP: A supervised learning approach for priority cuts technology mapping. In2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 859–864

2021
[22]

Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J

Alexander Novikov, Ngân V u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Z. Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. 2025. AlphaEvolve: A coding agent for scientific and al...

2025
[23]

Sentovich, Kenneth J

Ellen M. Sentovich, Kenneth J. Singh, Luciano Lavagno, Carl M. Pixley, and et al. 1992.SIS: A System for Sequential Circuit Synthesis. Technical Report UCB/ERL M92/41. UC Berkeley, Electronics Research Laboratory

1992
[24]

Ivan Smirnov et al. 2023. Machine-Learned Algorithmic Improvement.Nature (2023)

2023
[25]

Kung, Trevor J

Luciano Stok, Darrin M. Kung, Trevor J. Chak, Daniel Brand, João P. Marques Silva, and Jochen A. G. Jess. 1996. BooleDozer: Logic Synthesis for ASICs.IBM Journal of Research and Development40, 4 (1996), 407–430

1996
[26]

Xufeng Yao, Jiaxi Jiang, Yuxuan Zhao, Peiyu Liao, Yibo Lin, and Bei Yu. 2026. EvoPlace: Evolution of Optimization Algorithms for Global Placement via Large Language Models.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2026)

2026
[27]

Cunxi Yu. 2020. Flowtune: Practical multi-armed bandits in boolean optimization. InProceedings of the 39th International Conference on Computer-Aided Design. 1–9

2020
[28]

Ciesielski, and Alan Mishchenko

Chengyu Yu, Maciej J. Ciesielski, and Alan Mishchenko. 2018. Fast algebraic rewriting based on And-Inverter Graphs.IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems37, 9 (2018), 1907–1911

2018
[29]

Cunxi Yu, Rongjian Liang, Chia-Tung Ho, and Haoxing Ren. 2025. Autonomous Code Evolution Meets NP-Completeness.arXiv preprint arXiv:2509.07367(2025)

work page arXiv 2025
[30]

Cunxi Yu, Haoxing Xiao, and Giovanni De Micheli. 2018. Developing synthesis flows without human knowledge. InProc. 55th Design Automation Conference (DAC). 50:1–50:6

2018
[31]

Kaisheng Zhu, Mingyu Liu, Haoxing Chen, Zhiyao Zhao, and David Z. Pan. 2020. Exploring logic optimizations with reinforcement learning and graph neural network. InProc. ACM/IEEE Workshop on Machine Learning for CAD (MLCAD). 145–150

2020
[32]

Kaisheng Zhu, Mingyu Liu, and David Pan. 2020. Exploring logic optimizations with reinforcement learning and graph neural networks. InMLCAD. 145–150

2020