Recognition: unknown
GR-Evolve: Design-Adaptive Global Routing via LLM-Driven Algorithm Evolution
Pith reviewed 2026-05-08 09:42 UTC · model grok-4.3
The pith
An LLM can evolve the source code of a global router to cut post-detailed-routing wirelength by up to 8.72 percent on specific designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GR-Evolve equips an LLM with persistent knowledge of open-source global routers and an integrated QoR evaluation pipeline inside OpenROAD; the LLM then iteratively rewrites the routing code until post-detailed-routing wirelength improves, yielding up to 8.72 percent reduction relative to static baseline routers on seven designs.
What carries the argument
The GR-Evolve code-evolution loop, in which an LLM agent proposes, applies, and evaluates source-level changes to the global router guided by QoR metrics.
If this is right
- Global routing can be specialized to each design's topology and constraints without manual hyperparameter search.
- LLM-driven modification of router source code can outperform static heuristics that have been hand-tuned for decades.
- Persistent context about multiple open-source routers lets the LLM make targeted algorithmic edits rather than random tweaks.
- Integration with an open EDA flow enables closed-loop evaluation of each code change during evolution.
Where Pith is reading between the lines
- The same loop could be applied to other EDA stages whose source code is available, such as placement or clock-tree synthesis.
- If the approach scales, design teams might shift from tuning tool knobs to supplying a design and letting the LLM produce a tailored router.
- A practical next test would be whether the evolved code remains effective when the same design is re-run on a different technology node.
Load-bearing premise
The language model will keep producing code changes that are both functionally correct and actually better, without introducing subtle bugs or regressions that the quality checks overlook.
What would settle it
A benchmark run in which an LLM-generated router either violates design rules or produces higher total wirelength after detailed routing than the unmodified baseline on the same design.
Figures
read the original abstract
Modern ASIC design is becoming increasingly complex, driving up design costs while limiting productivity gains from existing EDA tools. Despite decades of progress, current tools rely on fixed heuristics and offer limited control via tool hyperparameters, requiring extensive manual tuning to achieve an acceptable quality of results (QoR). While prior work has explored learning-based optimization and design-specific hyperparameter tuning, these approaches operate within the constraints of static tool algorithm implementations and do not adapt the underlying algorithms to individual designs. To address this limitation, we introduce the concept of design-adaptive EDA tooling, in which the internal algorithms of EDA tools are automatically specialized to the characteristics of a given design. We instantiate this paradigm through GR-Evolve, a code evolution framework that leverages an agentic large language model (LLM) to iteratively modify global routing source code using QoR-driven feedback. The framework equips the LLM with persistent contextual knowledge of open-source global routers along with an integrated toolchain for QoR evaluation within the OpenROAD infrastructure. We evaluate GR-Evolve across seven benchmark designs across three technology nodes and demonstrate up to 8.72% reduction in post-detailed-routing wirelength over existing baseline routers, highlighting the potential of LLM-driven EDA code evolution for design-adaptive global routing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GR-Evolve, a code-evolution framework that uses an agentic LLM to iteratively modify the source code of open-source global routers, guided by QoR feedback within the OpenROAD toolchain. It claims this enables design-adaptive EDA tooling and reports up to 8.72% reduction in post-detailed-routing wirelength across seven benchmark designs spanning three technology nodes, relative to existing baseline routers.
Significance. If the empirical results hold after verification, the work could establish a new paradigm of LLM-driven algorithm specialization in EDA, moving beyond static heuristics and hyperparameter tuning to per-design code adaptation. The multi-design, multi-node evaluation provides a concrete starting point for assessing the practicality of this approach in global routing.
major comments (3)
- [Evaluation] The central claim of up to 8.72% wirelength improvement rests on the unverified assumption that every LLM-proposed edit produces functionally correct routing code. No evidence of equivalence checking, code review, or regression suites beyond the primary wirelength metric is supplied; if any of the seven designs contains a latent connectivity, timing, or DRC violation introduced by the agent, the reported QoR gain is invalid.
- [Evaluation] The evaluation reports results across seven designs and three nodes but provides no information on baseline router implementations, statistical significance testing, ablation studies isolating the LLM evolution components, or controls for confounding factors in the QoR measurement pipeline (e.g., OpenROAD version, detailed router settings, or runtime limits).
- [Method] The framework description does not specify how the LLM's persistent contextual knowledge of open-source routers is constructed or maintained, nor does it address the risk that prompt design (a free parameter) could lead to non-reproducible or overfitted modifications.
minor comments (2)
- [Introduction] The abstract and introduction could more clearly distinguish the proposed design-adaptive paradigm from prior learning-based hyperparameter tuning work.
- [Evaluation] Figure captions and table headers should explicitly state the exact wirelength metric (e.g., post-detailed-routing total wirelength) and the precise baseline router versions used for each comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of verification, evaluation rigor, and methodological clarity that we will address in the revision. Below we respond point by point.
read point-by-point responses
-
Referee: [Evaluation] The central claim of up to 8.72% wirelength improvement rests on the unverified assumption that every LLM-proposed edit produces functionally correct routing code. No evidence of equivalence checking, code review, or regression suites beyond the primary wirelength metric is supplied; if any of the seven designs contains a latent connectivity, timing, or DRC violation introduced by the agent, the reported QoR gain is invalid.
Authors: We agree that explicit documentation of functional correctness is essential. All reported results were obtained by executing the full OpenROAD global-plus-detailed routing flow, which enforces DRC and timing checks; any edit producing violations would have caused the flow to fail or report errors, and such cases were discarded. However, the manuscript did not describe this process or additional checks. In the revised version we will add a dedicated verification subsection that details: (1) post-evolution manual review of the principal code changes, (2) execution of available regression tests on the modified routers, and (3) confirmation that every reported QoR number corresponds to a run with zero DRC violations and satisfied timing constraints. This will directly substantiate that the observed wirelength gains are not artifacts of invalid routing solutions. revision: yes
-
Referee: [Evaluation] The evaluation reports results across seven designs and three nodes but provides no information on baseline router implementations, statistical significance testing, ablation studies isolating the LLM evolution components, or controls for confounding factors in the QoR measurement pipeline (e.g., OpenROAD version, detailed router settings, or runtime limits).
Authors: We concur that the current evaluation description is insufficiently detailed. The revised manuscript will expand the experimental section to include: (1) precise specifications of the baseline router implementations (OpenROAD commit hashes, configuration files, and command-line settings), (2) statistical significance testing (e.g., paired t-tests or Wilcoxon tests across repeated runs with different random seeds where applicable), (3) ablation studies that isolate the LLM-driven code-evolution component from other factors, and (4) explicit controls for confounding variables such as fixed OpenROAD version, detailed-router parameters, and runtime budgets. These additions will allow readers to assess the robustness of the reported improvements. revision: yes
-
Referee: [Method] The framework description does not specify how the LLM's persistent contextual knowledge of open-source routers is constructed or maintained, nor does it address the risk that prompt design (a free parameter) could lead to non-reproducible or overfitted modifications.
Authors: We will substantially expand the method section to describe the construction and maintenance of the LLM's persistent contextual knowledge, including the initial seeding with router source code, documentation excerpts, and API references, as well as how this context is updated across iterations. To mitigate concerns about prompt design, we will: (1) release the exact prompts used in all experiments, (2) discuss the prompt-engineering process and its rationale, and (3) present sensitivity results obtained with alternative prompt formulations. While prompt choice is an inherent hyperparameter of LLM-based methods, these disclosures will improve reproducibility and allow assessment of potential overfitting. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external benchmarks.
full rationale
The paper introduces GR-Evolve as an LLM-based framework for evolving global routing code and reports empirical QoR improvements (up to 8.72% wirelength reduction) measured against independent baseline routers on seven external benchmark designs across technology nodes. No equations, derivations, fitted parameters, or self-referential definitions appear in the provided text. The central result is not obtained by construction from the method's own outputs or prior self-citations; it depends on external evaluation infrastructure (OpenROAD) and baseline comparisons. Self-citations, if present, are not load-bearing for the headline claim. This is the expected non-finding for an empirical systems paper whose validity hinges on experimental reproducibility rather than internal mathematical closure.
Axiom & Free-Parameter Ledger
free parameters (1)
- LLM context and prompt design
axioms (1)
- domain assumption LLM agents can understand and safely modify production-grade global routing source code
invented entities (1)
-
design-adaptive EDA tooling
no independent evidence
Reference graph
Works this paper leans on
-
[1]
FastRoute: An efficient and high-quality global router,
M. Pan, Y. Xu, Y. Zhang, and C. Chu, “FastRoute: An efficient and high-quality global router, ”VLSI Design, vol. 2012, no. 1, p. 608362, 2012
2012
-
[2]
CUGR: Detailed-routability-driven 3D global routing with probabilistic resource model,
J. Liu, C.-W. Pui, F. Wang, and E. F. Young, “CUGR: Detailed-routability-driven 3D global routing with probabilistic resource model, ” inProc. DAC, 2020
2020
-
[3]
SPRoute 2.0: A detailed- routability-driven deterministic parallel global router with soft capacity,
J. He, U. Agarwal, Y. Yang, R. Manohar, and K. Pingali, “SPRoute 2.0: A detailed- routability-driven deterministic parallel global router with soft capacity, ” inProc. ASP-DAC, 2022
2022
-
[4]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herb...
work page internal anchor Pith review arXiv 2021
-
[5]
Claude Code
Anthropic, “Claude Code. ” https://github.com/anthropics/claude-code, 2023
2023
-
[6]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shi- robokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian,et al., “AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery, ”arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review arXiv 2025
-
[7]
Automated QoR improvement in OpenROAD with coding agents,
A. Ghose, J. Jang, A. B. Kahng, and J. Lee, “Automated QoR improvement in OpenROAD with coding agents, ”arXiv preprint arXiv:2601.06268, 2026
-
[8]
Autonomous code evolution meets np-completeness
C. Yu, R. Liang, C.-T. Ho, and H. Ren, “Autonomous Code Evolution Meets NP-Completeness, ”arXiv preprint arXiv:2509.07367, 2025
-
[9]
Invited: Toward an open-source digital flow: First learnings from the openroad project,
T. Ajayi, V. A. Chhabria, M. Fogaça, S. Hashemi, A. Hosny, A. B. Kahng, M. Kim, J. Lee, U. Mallappa, M. Neseem,et al., “Invited: Toward an open-source digital flow: First learnings from the openroad project, ” inProc. DAC, 2019
2019
-
[10]
GR-Evolve
T. Jafri and V. A. Chhabria, “GR-Evolve. ” https://github.com/ASU-VDA-Lab/GR- Evolve, 2026
2026
-
[11]
ReAct: Synergizing Reasoning and Acting in Language Models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models, ” inProc. ICLR, 2022
2022
-
[12]
OpenEvolve: An open-source evolutionary coding agent,
A. Sharma, “OpenEvolve: An open-source evolutionary coding agent, ” 2025
2025
-
[13]
OpenAI, “GPT-4 technical report, ”arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review arXiv 2023
-
[14]
The Claude 3 Model Family: Opus, Sonnet, Haiku,
Anthropic, “The Claude 3 Model Family: Opus, Sonnet, Haiku, ” 2024
2024
-
[15]
Gemini CLI: An open-source AI agent for the terminal
Google DeepMind, “Gemini CLI: An open-source AI agent for the terminal. ” https://github.com/google-gemini/gemini-cli, 2025. Open-source terminal AI agent (Apache 2.0). Released June 2025
2025
-
[16]
Codex: Lightweight coding agent
OpenAI, “Codex: Lightweight coding agent. ” https://github.com/openai/codex,
-
[17]
Released April 2025 (Apache 2.0)
Open-source CLI coding agent. Released April 2025 (Apache 2.0)
2025
-
[18]
ORFS-agent: Tool-Using Agents for Chip Design Optimization,
A. Ghose, A. B. Kahng, S. Kundu, and Z. Wang, “ORFS-agent: Tool-Using Agents for Chip Design Optimization, ” inProc. MLCAD, 2025
2025
-
[19]
OpenROAD Agent: An Intelligent Self-Correcting Script Generator for OpenROAD,
B.-Y. Wu, U. Sharma, A. Rovinski, and V. A. Chhabria, “OpenROAD Agent: An Intelligent Self-Correcting Script Generator for OpenROAD, ” inProc. ICLAD, 2025
2025
-
[20]
OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks,
U. Sharma, B.-Y. Wu, S. R. D. Kankipati, V. A. Chhabria, and A. Rovinski, “OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks, ” inProc. MLCAD, 2024
2024
-
[21]
Invited: Agentic ai for physical design R&D: Status and prospects,
A. Ghose, A. B. Kahng, S. Kundu, and B. Pramanik, “Invited: Agentic ai for physical design R&D: Status and prospects, ” inProc. ISPD, 2026
2026
-
[22]
Focus session: Large language models in physical design: From data generation to intelligent agents,
B.-Y. Wu, A. Dey, A. Rovinski, and V. Chhabria, “Focus session: Large language models in physical design: From data generation to intelligent agents, ” inProc. DATE, 2026
2026
-
[23]
Long-context llms struggle with long in-context learning.arXiv preprint arXiv:2404.02060, 2024
T. Li, G. Zhang, Q. D. Do, X. Yue, and W. Chen, “Long-context LLMs struggle with long in-context learning, ”arXiv preprint arXiv:2404.02060, 2024
-
[24]
SkyWater SKY130 PDK
SkyWater PDK Authors, “SkyWater SKY130 PDK. ” https://github.com/google/ skywater-pdk, 2020. Accessed: 2024
2020
-
[25]
FreePDK: An open-source variation- aware design kit,
J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh,et al., “FreePDK: An open-source variation- aware design kit, ” inProc. ICMSE, 2007
2007
-
[26]
ASAP7: A 7-nm FinFET predictive process design kit,
L. T. Clark, V. Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, “ASAP7: A 7-nm FinFET predictive process design kit, ”Microelec- tronics Journal, vol. 53, pp. 105–115, 2016
2016
-
[27]
OpenROAD-flow-scripts
“OpenROAD-flow-scripts. ” https://github.com/The-OpenROAD-Project/ OpenROAD-flow-scripts, 2026
2026
-
[28]
2019 CAD Contest: LEF/DEF based global routing,
S. Dolgov, A. Volkov, L. Wang, and B. Xu, “2019 CAD Contest: LEF/DEF based global routing, ” inProc. ICCAD, 2019
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.