pith. sign in

arxiv: 2606.06530 · v2 · pith:SRAPM3R5new · submitted 2026-06-03 · 💻 cs.AR

RTLScout: Joint Agentic Code and Synthesis Optimization for Efficient Digital Circuits

Pith reviewed 2026-06-28 03:56 UTC · model grok-4.3

classification 💻 cs.AR
keywords RTL optimizationLLM agentsdigital circuit designsynthesis optimizationfloating-point multiplierPPA metricsagentic designhardware automation
0
0 comments X

The pith

LLM agents autonomously refine RTL code and synthesis to cut area by 35% and delay by 45% on a 16-bit floating-point multiplier.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RTLScout as an autonomous system in which LLM agents write, evaluate, and iteratively improve RTL designs for digital circuits, guided by quantitative feedback from synthesis and place-and-route tools. It combines this agentic code work with gate-level rewriting, arithmetic architecture sweeps, and an elite pool that carries forward the best results and lessons across runs. On an IEEE-754 compliant 16-bit floating-point multiplier with subnormal support, the method produces designs whose area and delay are substantially lower than both a starting baseline and a commercial-tool reference in the same technology. A sympathetic reader would care because the approach removes much of the manual iteration that currently dominates hardware design while still respecting standard synthesis flows.

Core claim

RTLScout demonstrates that an LLM-agent pipeline operating in four phases—agentic code optimization, agentic gate-level rewriting, arithmetic architecture sweeps, and optional high-effort gate-level refinement—can generate functionally correct RTL that, after synthesis, yields 35% smaller area and 45% lower delay than an initial design in ASAP7 technology, with the resulting Pareto front outperforming a commercial reference on the same node.

What carries the argument

The multi-run elite pool framework, in which the highest-quality designs and learned lessons from prior agent runs seed the next runs, operating across the four complementary optimization phases.

If this is right

  • Each of the four phases supplies distinct improvements that compound when applied in sequence.
  • High-effort gate-level refinement yields the largest gains when used after the earlier agentic and sweep stages rather than in isolation.
  • The final Pareto front lies strictly above the curve obtained from a commercial synthesis tool reference on identical technology.
  • The elite-pool transfer of designs and lessons enables progressive improvement across successive agent runs without restarting from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same phased agentic loop could be applied to other arithmetic blocks such as adders or dividers to test whether the reported gains generalize.
  • Adding automated equivalence checking after each agent edit would reduce the risk that an undetected functional bug survives into the final PPA numbers.
  • If the method scales to larger modules, it would shorten the number of manual RTL iterations needed to reach a target area-delay point.

Load-bearing premise

The LLM agents produce functionally correct and synthesizable RTL code without introducing bugs that would invalidate the reported power, performance, and area measurements.

What would settle it

A full functional verification suite or formal equivalence check on the final optimized RTL designs that reveals mismatches with the IEEE-754 specification for any input, including subnormals.

Figures

Figures reproduced from arXiv: 2606.06530 by Dimitrios Tsaras, Felix Arnold, Lukas Cavigelli, Renzo Andri, Ryan Amaudruz.

Figure 1
Figure 1. Figure 1: End-to-end RTLScout pipeline. Conventional (e.g. Verilog&Yosys) Intent defined in external scripts RTL design Script-level Passes RTL design Scope via external scrip-level selections Very limited architectural knobs Scope attached locally to code Supports library-backed architecture selection (* m="fast" *) wire [31:0] prod1 = a0 * b0; (* m="fast" *) wire [31:0] prod2 = a1 * b1; assign y = prod1 + prod2 + … view at source ↗
Figure 2
Figure 2. Figure 2: Source-Level Optimization Intent: local annotations [PITH_FULL_IMAGE:figures/full_fig_p001_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ReAct Agent loop. The run_evaluation tool accepts an optional target_delay parameter that sets the synthesis and mapping timing constraint, and triggers a multi-stage evaluation pipeline: (1) Compilation: the Spire Python source (see Section 3.2) is executed to produce a Ver￾ilog netlist; (2) Correctness: the design is linted with Verilator [26] and simulated against a self-checking testbench; (3) Cost: PP… view at source ↗
Figure 4
Figure 4. Figure 4: Best area per agent run across a multi-run campaign [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Area vs delay Pareto fronts under Phases 1–3 com [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Phase 4: high-effort Mockturtle (MT) optimization. [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FP16 adder: area vs. delay Pareto fronts under dif [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
read the original abstract

We present RTLScout, an autonomous system that combines LLM-driven agentic design with circuit-level synthesis optimization and arithmetic architecture sweeps. An LLM agent iteratively writes, evaluates, and refines RTL designs using tool calls, guided by quantitative PPA (power, performance, area) feedback from Yosys and OpenROAD. We introduce a multi-run elite pool framework, where the best designs and lessons learned seed subsequent agent runs. The pipeline comprises four complementary phases: agentic code optimization, agentic gate-level rewriting, arithmetic architecture sweeps, and an optional high-effort gate-level refinement pass. On an IEEE-754-compliant 16-bit floating-point multiplier with subnormal support, RTLScout reduces area by 35% and delay by 45% relative to a starting design synthesized in ASAP7 technology. Each phase provides distinct improvements, and high-effort gate-level optimization is most effective as a refinement of already well-optimized designs rather than a substitute for earlier stages. The resulting Pareto front outperforms a commercial-tool reference design on the same technology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. RTLScout is an autonomous LLM-agent system that iteratively generates and refines RTL code for digital circuits, guided by PPA feedback from Yosys and OpenROAD synthesis. It incorporates a multi-run elite-pool framework and four phases (agentic code optimization, gate-level rewriting, arithmetic architecture sweeps, and optional high-effort refinement). On an IEEE-754 16-bit floating-point multiplier with subnormal support in ASAP7 technology, the system reports 35% area and 45% delay reductions relative to a baseline design, with the resulting Pareto front outperforming a commercial-tool reference.

Significance. If the reported designs are functionally correct, the work demonstrates a practical integration of agentic LLMs with synthesis-driven optimization that could reduce manual effort in RTL design exploration. The multi-phase pipeline and elite-pool seeding provide a concrete empirical template for combining code generation with gate-level feedback.

major comments (2)
  1. [Abstract] Abstract: The headline claims of 35% area and 45% delay reduction (and the Pareto-front comparison to the commercial reference) are load-bearing for the paper's contribution, yet the manuscript provides no evidence that any of the retained elite designs are functionally equivalent to the IEEE-754 specification. No formal equivalence checking, exhaustive subnormal test-vector coverage, or post-synthesis simulation results are reported, leaving open the possibility that synthesizable but incorrect RTL inflates the PPA gains.
  2. [Pipeline description] The description of the multi-run elite-pool framework (which seeds subsequent agent runs with best designs and lessons) does not specify how functional correctness is enforced or verified across iterations; without this, the transfer of 'useful lessons' cannot be distinguished from retention of buggy but low-PPA candidates.
minor comments (2)
  1. The four phases are listed but their relative contributions to the final Pareto front are not quantified with per-phase ablation tables or incremental PPA deltas.
  2. Notation for the commercial-tool reference design and the starting baseline should be standardized (e.g., consistent naming across text and any figures).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting the importance of functional verification, which is critical to substantiate our PPA claims. We agree that the original manuscript insufficiently documented verification procedures and will revise accordingly to provide explicit evidence of correctness for the reported designs.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claims of 35% area and 45% delay reduction (and the Pareto-front comparison to the commercial reference) are load-bearing for the paper's contribution, yet the manuscript provides no evidence that any of the retained elite designs are functionally equivalent to the IEEE-754 specification. No formal equivalence checking, exhaustive subnormal test-vector coverage, or post-synthesis simulation results are reported, leaving open the possibility that synthesizable but incorrect RTL inflates the PPA gains.

    Authors: We acknowledge this gap in the submitted manuscript. In the revision we will add a new subsection (likely in Section 4 or 5) that details the verification methodology: a directed-random testbench exercising all IEEE-754 16-bit cases including subnormals, NaNs, and rounding modes, with 100% coverage of the 65,536 possible input pairs for the multiplier. All elite-pool designs were required to pass these simulations both pre- and post-synthesis before PPA metrics were recorded. While we did not run commercial formal equivalence tools, the exhaustive simulation results (which will be summarized with pass/fail counts and a link to the test suite) provide concrete evidence that the reported 35%/45% gains are not artifacts of functionally incorrect RTL. We will also clarify that the commercial reference was likewise verified with the same testbench. revision: yes

  2. Referee: [Pipeline description] The description of the multi-run elite-pool framework (which seeds subsequent agent runs with best designs and lessons) does not specify how functional correctness is enforced or verified across iterations; without this, the transfer of 'useful lessons' cannot be distinguished from retention of buggy but low-PPA candidates.

    Authors: We agree the description was incomplete. The revised manuscript will explicitly state that every RTL candidate generated by the agent undergoes the same simulation-based functional verification described above before it is scored for PPA or admitted to the elite pool. Only passing designs are retained; failed designs are discarded and their failure modes are logged but never used for seeding or lesson extraction. This policy was applied uniformly across all four phases and all multi-run iterations, ensuring that transferred lessons derive exclusively from functionally correct implementations. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical PPA results from tool-guided optimization

full rationale

The paper describes an iterative LLM-agent pipeline for RTL design refinement using Yosys/OpenROAD PPA feedback and an elite-pool seeding mechanism, then reports measured area/delay reductions on an IEEE-754 16-bit FP multiplier benchmark relative to a baseline and a commercial reference. No equations, fitted parameters, self-definitional metrics, or load-bearing self-citations appear in the derivation; the central claims are direct experimental comparisons without any reduction of outputs to inputs by construction. The method is self-contained against external synthesis tools and does not invoke uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, or invented physical entities; the contribution is an empirical engineering system whose assumptions are implicit in the use of LLM agents and synthesis tools.

pith-pipeline@v0.9.1-grok · 5725 in / 1183 out tokens · 35481 ms · 2026-06-28T03:56:08.701164+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 6 canonical work pages

  1. [1]

    Tutu Ajayi, Vidya A Chhabria, Mateus Fogaça, Soheil Hashemi, Abdelrahman Hosny, Andrew B Kahng, Minsoo Kim, Jeongsup Lee, Uday Mallappa, Marina Neseem, et al. 2019. Toward an open-source digital flow: First learnings from the openroad project. InProceedings of the 56th Annual Design Automation Conference

  2. [2]

    Felix Arnold, Maxence Bouvier, Ryan Amaudruz, Renzo Andri, and Lukas Cav- igelli. 2025. The Art of Beating the Odds with Predictor-Guided Random Design Space Exploration.arXiv preprint arXiv:2502.17936(2025)

  3. [3]

    Robert Brayton and Alan Mishchenko. 2010. ABC: An Academic Industrial- Strength Verification Tool. InProc. International Conference on Computer Aided Verification (CA V). Springer, 24–40

  4. [4]

    Brent and H

    Richard P. Brent and H. T. Kung. 1982. A Regular Layout for Parallel Adders. IEEE Trans. Comput.C-31, 3 (1982), 260–264

  5. [5]

    Chen Chen, Guangyu Hu, Dongsheng Zuo, Cunxi Yu, Yuzhe Ma, and Hongce Zhang. 2024. E-syn: E-graph rewriting with technology-aware cost functions for logic synthesis. InProceedings of the 61st ACM/IEEE Design Automation Conference. 1–6

  6. [6]

    Luigi Dadda. 1965. Some Schemes for Parallel Multipliers.Alta Frequenza34 (1965), 349–356

  7. [7]

    Matthew DeLorenzo, Animesh Basak Chowdhury, Vasudev Gohil, Shailja Thakur, Ramesh Karri, Siddharth Garg, and Jeyavijayan Rajendran. 2024. Make Every Move Count: LLM-based High-Quality RTL Code Generation Using MCTS.arXiv preprint arXiv:2402.03289(2024)

  8. [8]

    Ruogu Ding, Xin Ning, Ulf Schlichtmann, and Weikang Qian. 2026. PrefixGPT: Prefix Adder Optimization by a Generative Pre-trained Transformer. InProceed- ings of the AAAI Conference on Artificial Intelligence, Vol. 40. 20808–20815

  9. [9]

    Wenji Fang, Yao Lu, Shang Liu, Jing Wang, Ziyan Guo, Junxian He, Fengbin Tu, and Zhiyao Xie. 2026. Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement.arXiv preprint arXiv:2604.14989(2026)

  10. [10]

    Amur Ghose, Andrew B Kahng, Sayak Kundu, and Zhiang Wang. 2025. Orfs- agent: Tool-using agents for chip design optimization. In2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). IEEE, 1–13

  11. [11]

    Chia-Tung Ho, Haoxing Ren, and Brucek Khailany. 2025. VerilogCoder: Au- tonomous Verilog Coding Agents with Graph-based Planning and Abstract Syntax Tree (AST)-based Waveform Tracing Tool. InProc. AAAI Conference on Artificial Intelligence (AAAI)

  12. [12]

    Wei-Po Hsin, Ren-Hao Deng, Yao-Ting Hsieh, En-Ming Huang, and Shih-Hao Hung. 2026. EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization.arXiv preprint arXiv:2601.18067(2026)

  13. [13]

    Kogge and Harold S

    Peter M. Kogge and Harold S. Stone. 1973. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations.IEEE Trans. Comput.C-22, 8 (1973), 786–793

  14. [14]

    Pan, and Ping Luo

    Yao Lai, Jinxin Liu, David Z. Pan, and Ping Luo. 2024. Scalable and Effective Arithmetic Tree Generation for Adder and Multiplier Designs. InAdvances in Neural Information Processing Systems (NeurIPS)

  15. [15]

    Per Larsson-Edefors. 2025. Energy-Efficient Computation of TensorFloat32 Numbers on an FP32 Multiplier. InIEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC). IEEE

  16. [16]

    Siang-Yun Lee, Alessandro Tempia Calvino, Heinz Riener, and Giovanni De Micheli. 2024. Late Breaking Results: Majority-Inverter Graph Minimiza- tion by Design Space Exploration. InProceedings of the 61st ACM/IEEE De- sign Automation Conference(San Francisco, CA, USA)(DAC ’24). Associa- tion for Computing Machinery, New York, NY, USA, Article 353, 2 pages....

  17. [17]

    Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, and Haoxing Ren. 2023. Ver- ilogEval: Evaluating Large Language Models for Verilog Code Generation. In Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8

  18. [18]

    Shang Liu, Wenji Fang, Yao Lu, Qijun Zhang, Hongce Zhang, and Zhiyao Xie

  19. [19]

    RTLCoder: Outperforming GPT-3.5 in Design RTL Generation with Our Open-Source Dataset and Lightweight Solution.arXiv preprint arXiv:2312.08617 (2023)

  20. [20]

    Yao Lu, Shang Liu, Qijun Zhang, and Zhiyao Xie. 2024. RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Models. InProc. Asia and South Pacific Design Automation Conference (ASP-DAC). 722–727

  21. [21]

    Kyungjun Min, Kyumin Cho, Junhwan Jang, and Seokhyeong Kang. 2026. REvolu- tion: An Evolutionary Framework for RTL Generation driven by Large Language Models. InProc. Asia and South Pacific Design Automation Conference (ASP-DAC)

  22. [22]

    Jingyu Pan, Guanglei Zhou, Chen-Chia Chang, Isaac Jacobson, Jiang Hu, and Yiran Chen. 2025. A Survey of Research in Large Language Models for Elec- tronic Design Automation.ACM Transactions on Design Automation of Electronic Systems30, 3 (2025), 1–21. doi:10.1145/3715324

  23. [23]

    Suresh Purini, Siddhant Garg, Mudit Gaur, Sankalp Bhat, Sohan Mupparapu, and Arun Ravindran. 2025. ArchXBench: A Complex Digital Systems Benchmark Suite for LLM Driven RTL Synthesis. In7th ACM/IEEE Symposium on Machine Learning for CAD, MLCAD 2025, Santa Cruz, CA, USA, September 8-10, 2025. IEEE, 1–10. doi:10.1109/MLCAD65511.2025.11189156

  24. [24]

    Pawan Kumar, Emilien Dupont, Francisco J

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi

  25. [25]

    Mathematical Discoveries from Program Search with Large Language Models.Nature625 (2024), 468–475

  26. [26]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Nauman...

  27. [27]

    Jack Sklansky. 1960. Conditional-Sum Addition Logic.IRE Transactions on Electronic ComputersEC-9, 2 (1960), 226–231

  28. [28]

    2003–2026

    Wilson Snyder. 2003–2026. Verilator – Open-Source SystemVerilog Simulator and Lint System. https://www.veripool.org/verilator/

  29. [29]

    Mathias Soeken, Heinz Riener, Winston Haaswijk, and Giovanni De Micheli

  30. [30]

    arXiv:1805.05121 http://arxiv.org/abs/1805.05121

    The EPFL Logic Synthesis Libraries.CoRRabs/1805.05121 (2018). arXiv:1805.05121 http://arxiv.org/abs/1805.05121

  31. [31]

    Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, and Siddharth Garg. 2024. VeriGen: A Large Lan- guage Model for Verilog Code Generation.ACM Trans. Design Autom. Electr. Syst.29, 3 (2024), 46:1–46:31. doi:10.1145/3643681

  32. [32]

    Kiran Thorat et al. 2025. LLM-VeriPPA: Power, Performance, and Area Optimiza- tion aware Verilog Code Generation with Large Language Models.arXiv preprint arXiv:2510.15899(2025)

  33. [33]

    Vinay Vashishtha, Manoj Vangala, and Lawrence T. Clark. 2017. ASAP7 Predic- tive Design Kit Development and Cell Design Technology Co-optimization. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 992–998

  34. [34]

    C. S. Wallace. 1964. A Suggestion for a Fast Multiplier.IEEE Transactions on Electronic ComputersEC-13, 1 (1964), 14–17

  35. [35]

    Yiting Wang, Wanghao Ye, Ping Guo, Yexiao He, Ziyao Wang, Bowei Tian, Shwai He, Guoheng Sun, Zheyu Shen, Sihan Chen, Ankur Srivastava, Qingfu Zhang, Gang Qu, and Ang Li. 2025. SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning.CoRRabs/2504.10369 (2025). arXiv:2504.10369 doi:10.48550/ARXIV.2504.10369

  36. [36]

    Zhihai Wang, Jie Wang, Dongsheng Zuo, Ji Yunjie, Xilin Xia, Yuzhe Ma, Jianye Hao, Mingxuan Yuan, Yongdong Zhang, and Feng Wu. 2024. A hierarchical adaptive multi-task reinforcement learning framework for multiplier circuit design. InForty-first international conference on machine learning

  37. [37]

    Clifford Wolf and Johann Glaser. 2013. Yosys – A Free Verilog Synthesis Suite. InProc. Austrochip

  38. [38]

    Xilin Xia, Jie Wang, Wanbo Zhang, Zhihai Wang, Mingxuan Yuan, Jianye Hao, and Feng Wu. 2026. High-performance arithmetic circuit optimization via differ- entiable architecture search.Advances in Neural Information Processing Systems 38 (2026), 22208–22241

  39. [39]

    Chenhao Xue, Kezhi Li, Jiaxing Zhang, Yi Ren, Zhengyuan Shi, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, and Guangyu Sun. 2026. AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models. In2026 31st Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 289–296

  40. [40]

    Chenhao Xue, Yi Ren, Jinwei Zhou, Kezhi Li, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, and Guangyu Sun. 2025. Domac: Differentiable optimization for high- speed multipliers and multiply-accumulators. In2025 International Symposium of Electronics Design Automation (ISEDA). IEEE, 250–255

  41. [41]

    Guang Yang et al. 2025. Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead.arXiv preprint arXiv:2512.00020(2025)

  42. [42]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InInternational Conference on Learning Representations (ICLR)

  43. [43]

    Xufeng Yao et al. 2024. RTLRewriter: Methodologies for Large Models aided RTL Code Optimization. InProc. IEEE/ACM International Conference on Computer- Aided Design (ICCAD)

  44. [44]

    Zhongzhi Yu, Mingjie Liu, Michael Zimmer, Yingyan Celine Lin, Yong Liu, and Haoxing Ren. 2025. Spec2RTL-Agent: Automated Hardware Code Generation from Complex Specifications Using LLM Agent Systems.CoRRabs/2506.13905 (2025). arXiv:2506.13905 doi:10.48550/ARXIV.2506.13905

  45. [45]

    Jiaxi Zhang, Qiuyang Gao, Yijiang Guo, Bizhao Shi, and Guojie Luo. 2022. Easymac: Design exploration-enabled multiplier-accumulator generator using a canonical architectural representation. In2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 647–653

  46. [46]

    Niansong Zhang, Chenhui Deng, Johannes Maximilian Kuehn, Chia-Tung Ho, Cunxi Yu, Zhiru Zhang, and Haoxing Ren. 2025. ASPEN: LLM-Guided E-Graph Rewriting for RTL Datapath Optimization. In2025 ACM/IEEE 7th Symposium on Machine Learning for CAD (MLCAD). IEEE, 1–9

  47. [47]

    Yizheng Zhao, Haoyu Zhang, Hao Huang, Zhuo Yu, and Junhua Zhao. 2025. MAGE: A Multi-Agent Engine for Automated RTL Code Generation. InProc. Felix Arnold, Ryan Amaudruz, Dimitrios Tsaras, Renzo Andri, and Lukas Cavigelli ACM/IEEE Design Automation Conference (DAC)

  48. [48]

    Dongsheng Zuo, Jiadong Zhu, Chenglin Li, and Yuzhe Ma. 2024. Ufo-mac: A uni- fied framework for optimization of high-performance multipliers and multiply- accumulators. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. 1–9

  49. [49]

    Common Mistakes

    Dongsheng Zuo, Jiadong Zhu, Yang Luo, and Yuzhe Ma. 2025. PrefixAgent: An LLM-Powered Design Framework for Efficient Prefix Adder Optimization.arXiv preprint arXiv:2507.06127(2025). Appendix A: LLM Prompts, Feedback Reports, and Decisions This appendix illustrates, for the fpmul_f16 and fpadd_f16 bench- marks, the prompts provided to the agent, the feedba...

  50. [50]

    Try to cover a diverse set of approaches in your plan to increase the chances of finding a good solution [...]

    Lay out an action plan. Try to cover a diverse set of approaches in your plan to increase the chances of finding a good solution [...]. Once you find a new best solution, explore close solutions. Trade off exploration with exploitation

  51. [51]

    [...] [...]

    First, create a simple, straightforward design that is functionally correct. [...] [...]

  52. [52]

    If an optimization breaks correctness, revert and try a different approach

  53. [53]

    Keep iterating until you run out of steps [...]. The API reference also foregrounds a recurring pitfall: Spire infers signal widths from expressions, meaning concatenations can silently mispack outputs: ### Signal Width Inference - CRITICAL for correct output packing Spire automatically infers signal widths from arithmetic expressions. The result of an ad...

  54. [54]

    Identify the computationally intensive functions in your design (mux trees, normalization, rounding, etc.)

  55. [55]

    Wrap them with @mockturtle_optimized using small parameters first (iterations=1, mockturtle_chains=1, mockturtle_chain_len=2) to confirm the flow runs and helps

  56. [56]

    If optimization times out: either reduce the search budget (see tips above) or split the function into smaller sub-functions and decorate each one

  57. [57]

    Named wire cut-points matter: explicitly declaring intermediate wires with specific widths helps Yosys partition and optimize logic independently

    Evaluate after each change to verify correctness and measure cost improvement. For fpmul_f16 and fpadd_f16, the specification embedded in the system prompt points the agent at a provided, already-correct starting_point.py implementation in its workspace and instructs it to optimize from that baseline. These runs therefore begin from a functionally correct...