Recognition: unknown
Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC
Pith reviewed 2026-05-10 09:48 UTC · model grok-4.3
The pith
LLM agents can autonomously rewrite sections of the full ABC logic synthesis codebase and discover new strategies that improve quality of results on standard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a team of LLM-based agents, operating under programming guidance prompts and a unified correctness-plus-QoR evaluation loop, can progressively rewrite and evolve specific sub-components of the ABC codebase. Each cycle generates modifications, rebuilds the integrated binary, validates behavior, and scores results on multi-suite benchmarks. Through this closed feedback process the system identifies optimizations beyond human-designed heuristics and learns new synthesis strategies.
What carries the argument
The multi-agent self-evolution loop that applies prompt-driven code rewrites to ABC sub-components, followed by compilation, correctness validation, and QoR measurement on benchmark suites.
If this is right
- The evolved ABC binary produces higher quality-of-results than the starting version on ISCAS 85/89/99, VTR, EPFL, and IWLS 2005 suites.
- New synthesis strategies emerge that were not present in the human-designed heuristics used at bootstrap.
- The framework can continue through multiple evolution cycles while preserving ABC's single-binary execution model and command interface.
- The same agent-driven rewrite process can be applied to other large open-source EDA components without changing their external interfaces.
Where Pith is reading between the lines
- If the method scales to million-line codebases, similar agent teams could be pointed at other long-lived synthesis or verification tools to reduce manual maintenance effort.
- Success would imply that prompt-based agents can discover domain-specific optimizations that human engineers have overlooked in complex software.
- A practical next test would be to freeze the evolved code and evaluate it on entirely new benchmark families to check whether improvements generalize beyond the training loop.
- Over longer horizons the same loop could be run continuously as new hardware targets appear, letting the tool adapt without human intervention.
Load-bearing premise
LLM agents guided only by prompts and an evaluation loop can reliably produce functionally correct code changes that deliver genuine QoR gains on the full ABC codebase without introducing subtle bugs or overfitting to the benchmarks.
What would settle it
Take the final evolved ABC binary and run it on a fresh collection of circuits drawn from a different source than the evolution benchmarks; if the reported QoR gains disappear or if the binary produces incorrect outputs on edge cases, the central claim fails.
Figures
read the original abstract
This paper introduces the first \emph{self-evolving} logic synthesis framework, which leverages Large Language Model (LLM) agents to autonomously improve the source code of \textsc{ABC}, the widely adopted logic synthesis system. Our framework operates on the \emph{entire integrated ABC codebase}, and the output repository preserves its single-binary execution model and command interface. In the initial evolution cycle, we bootstrap the system using existing prior open-source synthesis components, covering flow tuning, logic minimization, and technology mapping, but without manually injecting new heuristics. On top of this foundation, a team of LLM-based agents iteratively rewrites and evolves specific sub-components of ABC following our ``programming guidance`` prompts under a unified correctness and QoR-driven evaluation loop. Each evolution cycle proposes code modifications, compiles the integrated binary, validates correctness, and evaluates quality-of-results (QoR) on \emph{multi-suite benchmarks including ISCAS~85/89/99, VTR, EPFL, and IWLS~2005}. Through continuous feedback, the system discovers optimizations beyond human-designed heuristics, effectively \emph{learning new synthesis strategies} that enhance QoR. We detail the architecture of this self-improving system, its integration with \textsc{ABC}, and results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a multi-agent LLM framework that autonomously evolves the full ABC logic synthesis codebase (preserving its single-binary interface) by iteratively rewriting sub-components via prompt-guided modifications. It bootstraps from prior open-source components and uses a unified loop of compilation, correctness validation, and QoR evaluation on ISCAS 85/89/99, VTR, EPFL, and IWLS 2005 suites, claiming to discover new synthesis strategies that progressively improve QoR beyond human-designed heuristics at million-line scale.
Significance. If the central claim holds with verifiable evidence, the work would be significant as the first demonstration of self-evolving EDA tools at the scale of a production synthesis system, potentially reducing reliance on manual heuristic tuning and enabling continuous optimization. The preservation of ABC's command interface and use of public multi-suite benchmarks are practical strengths that support reproducibility.
major comments (2)
- [Abstract] Abstract: the manuscript states that the system 'discovers optimizations beyond human-designed heuristics' and supplies 'results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale,' yet provides no quantitative QoR deltas, number of evolution cycles, specific code changes, or comparisons against the baseline ABC binary; this absence is load-bearing for the central claim.
- [Evaluation Loop] Evaluation description: the correctness-plus-QoR loop relies exclusively on compilation and execution against the listed benchmark suites without reported safeguards such as differential testing on unseen netlists, formal equivalence checking of modified passes, or coverage metrics on the evolved components; given the complexity of ABC's interacting C/C++ heuristics, this leaves open the possibility that reported gains reflect overfitting or undetected regressions rather than robust improvements.
minor comments (1)
- [Abstract] The abstract and text use 'programming guidance' in quotes without defining the exact prompt templates or agent roles, which would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where revisions have been made to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the manuscript states that the system 'discovers optimizations beyond human-designed heuristics' and supplies 'results demonstrating that the framework can autonomously and progressively improve EDA tool at full million-line scale,' yet provides no quantitative QoR deltas, number of evolution cycles, specific code changes, or comparisons against the baseline ABC binary; this absence is load-bearing for the central claim.
Authors: We agree that the abstract would benefit from more explicit references to the quantitative results. In the revised manuscript, we have updated the abstract to reference the specific QoR improvements, evolution cycle counts, and baseline comparisons that are presented in the evaluation sections and figures. This provides readers with a clearer high-level indication of the evidence while preserving conciseness; the detailed deltas, cycle information, and code change descriptions remain in the body of the paper. revision: yes
-
Referee: [Evaluation Loop] Evaluation description: the correctness-plus-QoR loop relies exclusively on compilation and execution against the listed benchmark suites without reported safeguards such as differential testing on unseen netlists, formal equivalence checking of modified passes, or coverage metrics on the evolved components; given the complexity of ABC's interacting C/C++ heuristics, this leaves open the possibility that reported gains reflect overfitting or undetected regressions rather than robust improvements.
Authors: The referee correctly notes that our evaluation description could be strengthened with additional safeguards. In the revision, we have expanded the evaluation section to discuss the use of multiple independent benchmark suites as a primary mitigation against overfitting and to report coverage metrics for the evolved components. We also include results from differential testing on a held-out subset of netlists. Formal equivalence checking was applied selectively to critical passes using ABC's verification commands, but exhaustive application across all heuristic interactions is not feasible at this scale; we now explicitly acknowledge this limitation and its implications for robustness. revision: partial
Circularity Check
No circularity: empirical self-evolution evaluated on external benchmarks
full rationale
The paper describes an empirical system in which LLM agents iteratively modify the ABC source code under a correctness-plus-QoR loop and measure results on independent public benchmark suites (ISCAS 85/89/99, VTR, EPFL, IWLS 2005). No equations, fitted parameters, or self-referential metrics appear; the claimed QoR gains are not shown to reduce by construction to the initial bootstrap components or to any internal definition. The central claim rests on observable compilation success and benchmark execution rather than on a derivation chain that imports its own outputs. Self-citations, if present, are not load-bearing for the reported improvements, and the evaluation protocol uses externally defined suites that remain fixed across evolution cycles.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents can generate functionally correct and QoR-improving modifications to a complex C++ codebase such as ABC when guided by programming prompts and an automated evaluation loop
Reference graph
Works this paper leans on
-
[1]
Luca Amarú, Pierre-Emmanuel Gaillardon, and Giovanni De Micheli. 2015. The EPFL combinational benchmark suite.Hypotenuse256, 128 (2015), 214335
2015
-
[2]
Satrajit Chatterjee, Robert K Brayton, and Alan Mishchenko. 2006. On resubsti- tution in logic synthesis. InICCAD. 144–149
2006
-
[3]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, and et al. 2021. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[4]
Scott Davidson. 1999. ITC’99 benchmark circuits-preliminary results. InInterna- tional Test Conference 1999. Proceedings (IEEE Cat. No. 99CH37034). IEEE Computer Society, 1125–1125
1999
-
[5]
Longfei Fan and Chang Wu. 2023. FPGA technology mapping with adaptive gate decomposition. InProc. ACM/SIGDA Int’l Symposium on Field-Programmable Gate Arrays (FPGA). 135–140
2023
-
[6]
Amur Ghose, Andrew B Kahng, Sayak Kundu, and Bodhisatta Pramanik. 2026. Agentic AI for Physical Design R&D: Status and Prospects. InProceedings of the 2026 International Symposium on Physical Design. 133–141
2026
-
[7]
Yujia Li, David Choi, Junyoung Chung, and et al. 2022. Competition-Level Code Generation with AlphaCode.Science378, 6624 (2022), 1092–1100
2022
-
[8]
Yingjie Li, Mingju Liu, Haoxing Ren, Alan Mishchenko, and Cunxi Yu. 2024. Dag-aware synthesis orchestration.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems43, 12 (2024), 4666–4675
2024
-
[9]
Junfeng Liu, Liwei Ni, Xingquan Li, Min Zhou, Lei Chen, Xing Li, Qinghua Zhao, and Shuai Ma. 2023. Aimap: Learning to improve technology mapping for asics via delay prediction. In2023 IEEE 41st International Conference on Computer Design (ICCD). IEEE, 344–347
2023
-
[10]
Mingju Liu, Daniel Robinson, Yingjie Li, Johannes Maximilian Kuehn, Rongjian Liang, Haoxing Ren, and Cunxi Yu. 2026. Maptune: Versatile asic technology mapping via reinforcement learning guided library tuning.ACM Transactions on Design Automation of Electronic Systems31, 4 (2026), 1–21
2026
- [11]
-
[12]
Alan Mishchenko et al . 2007. ABC: A System for Sequential Synthesis and Verification. https://people.eecs.berkeley.edu/~alanmi/abc/
2007
-
[13]
Alan Mishchenko and Robert Brayton. 2011. Scalable logic rewriting using don’t-cares. InDATE. 1432–1437
2011
-
[14]
Brayton, and et al
Alan Mishchenko, Robert K. Brayton, and et al. 2010. ABC: A System for Sequen- tial Synthesis and Verification. http://www.eecs.berkeley.edu/~alanmi/abc
2010
-
[15]
Alan Mishchenko, Satrajit Chatterjee, and Robert K. Brayton. 2006. DAG-aware AIG rewriting: a fresh look at combinational logic synthesis. InProc. 43rd Design Automation Conference (DAC). 532–535
2006
-
[16]
Alan Mishchenko, Satrajit Chatterjee, and Robert K Brayton. 2006. DAG-aware AIG rewriting: a fresh look at combinational logic synthesis. InDAC. 532–535
2006
-
[17]
Alan Mishchenko, Sungmin Cho, Satrajit Chatterjee, and Robert K. Brayton
-
[18]
Combinational and sequential mapping with priority cuts. InProc. Int’l Conference on Computer-Aided Design (ICCAD). 354–361
-
[19]
Kevin E Murray, Oleg Petelin, Sheng Zhong, Jia Min Wang, Mohamed Eldafrawy, Jean-Philippe Legault, Eugene Sha, Aaron G Graham, Jean Wu, Matthew JP Walker, et al. 2020. VTR 8: High-performance CAD and customizable FPGA architecture modelling.ACM Transactions on Reconfigurable Technology and Systems (TRETS)13, 2 (2020), 1–55
2020
-
[20]
Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu
Walter L. Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu. 2023. FlowTune: End-to-End Automatic Logic Optimization Exploration via Domain- Specific Multiarmed Bandit.IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems42, 6 (2023), 1912–1925
2023
-
[21]
Walter Lau Neto, Matheus T Moreira, Yingjie Li, Luca Amarù, Cunxi Yu, and Pierre-Emmanuel Gaillardon. 2021. SLAP: A supervised learning approach for priority cuts technology mapping. In2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 859–864
2021
-
[22]
Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J
Alexander Novikov, Ngân V u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Z. Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco J. R. Ruiz, Abbas Mehrabian, M. Pawan Kumar, Abigail See, Swarat Chaudhuri, George Holland, Alex Davies, Sebastian Nowozin, Pushmeet Kohli, and Matej Balog. 2025. AlphaEvolve: A coding agent for scientific and al...
2025
-
[23]
Sentovich, Kenneth J
Ellen M. Sentovich, Kenneth J. Singh, Luciano Lavagno, Carl M. Pixley, and et al. 1992.SIS: A System for Sequential Circuit Synthesis. Technical Report UCB/ERL M92/41. UC Berkeley, Electronics Research Laboratory
1992
-
[24]
Ivan Smirnov et al. 2023. Machine-Learned Algorithmic Improvement.Nature (2023)
2023
-
[25]
Kung, Trevor J
Luciano Stok, Darrin M. Kung, Trevor J. Chak, Daniel Brand, João P. Marques Silva, and Jochen A. G. Jess. 1996. BooleDozer: Logic Synthesis for ASICs.IBM Journal of Research and Development40, 4 (1996), 407–430
1996
-
[26]
Xufeng Yao, Jiaxi Jiang, Yuxuan Zhao, Peiyu Liao, Yibo Lin, and Bei Yu. 2026. EvoPlace: Evolution of Optimization Algorithms for Global Placement via Large Language Models.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2026)
2026
-
[27]
Cunxi Yu. 2020. Flowtune: Practical multi-armed bandits in boolean optimization. InProceedings of the 39th International Conference on Computer-Aided Design. 1–9
2020
-
[28]
Ciesielski, and Alan Mishchenko
Chengyu Yu, Maciej J. Ciesielski, and Alan Mishchenko. 2018. Fast algebraic rewriting based on And-Inverter Graphs.IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems37, 9 (2018), 1907–1911
2018
- [29]
-
[30]
Cunxi Yu, Haoxing Xiao, and Giovanni De Micheli. 2018. Developing synthesis flows without human knowledge. InProc. 55th Design Automation Conference (DAC). 50:1–50:6
2018
-
[31]
Kaisheng Zhu, Mingyu Liu, Haoxing Chen, Zhiyao Zhao, and David Z. Pan. 2020. Exploring logic optimizations with reinforcement learning and graph neural network. InProc. ACM/IEEE Workshop on Machine Learning for CAD (MLCAD). 145–150
2020
-
[32]
Kaisheng Zhu, Mingyu Liu, and David Pan. 2020. Exploring logic optimizations with reinforcement learning and graph neural networks. InMLCAD. 145–150
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.