arxiv: 2604.22234 · v1 · submitted 2026-04-24 · 💻 cs.AR

Recognition: unknown

GR-Evolve: Design-Adaptive Global Routing via LLM-Driven Algorithm Evolution

Taizun Jafri , Vidya A. Chhabria

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:42 UTC · model grok-4.3

classification 💻 cs.AR

keywords global routingLLMcode evolutionEDA toolsASIC designwirelength reductiondesign-adaptive routingOpenROAD

0 comments

The pith

An LLM can evolve the source code of a global router to cut post-detailed-routing wirelength by up to 8.72 percent on specific designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that fixed-algorithm EDA routers cannot adapt to the unique traits of each chip design, so they leave wirelength on the table even after hyperparameter tuning. GR-Evolve instead lets an agentic LLM rewrite the router's own source code, using quality-of-results feedback from an integrated OpenROAD toolchain to guide the changes. Experiments across seven benchmarks and three technology nodes show the evolved routers beat baseline implementations. A reader would care because modern ASIC flows are already expensive; automating algorithm specialization could shrink both manual effort and final interconnect cost.

Core claim

GR-Evolve equips an LLM with persistent knowledge of open-source global routers and an integrated QoR evaluation pipeline inside OpenROAD; the LLM then iteratively rewrites the routing code until post-detailed-routing wirelength improves, yielding up to 8.72 percent reduction relative to static baseline routers on seven designs.

What carries the argument

The GR-Evolve code-evolution loop, in which an LLM agent proposes, applies, and evaluates source-level changes to the global router guided by QoR metrics.

If this is right

Global routing can be specialized to each design's topology and constraints without manual hyperparameter search.
LLM-driven modification of router source code can outperform static heuristics that have been hand-tuned for decades.
Persistent context about multiple open-source routers lets the LLM make targeted algorithmic edits rather than random tweaks.
Integration with an open EDA flow enables closed-loop evaluation of each code change during evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same loop could be applied to other EDA stages whose source code is available, such as placement or clock-tree synthesis.
If the approach scales, design teams might shift from tuning tool knobs to supplying a design and letting the LLM produce a tailored router.
A practical next test would be whether the evolved code remains effective when the same design is re-run on a different technology node.

Load-bearing premise

The language model will keep producing code changes that are both functionally correct and actually better, without introducing subtle bugs or regressions that the quality checks overlook.

What would settle it

A benchmark run in which an LLM-generated router either violates design rules or produces higher total wirelength after detailed routing than the unmodified baseline on the same design.

Figures

Figures reproduced from arXiv: 2604.22234 by Taizun Jafri, Vidya A. Chhabria.

**Figure 1.** Figure 1: Design-adaptive EDA tools via LLM-driven EDA view at source ↗

**Figure 3.** Figure 3: Knowledge base and context provided to GR-Evolve. view at source ↗

**Figure 4.** Figure 4: Pareto fronts of the search space of all 15 router-design pairs in ASAP7. Selected router QoR is reported in Table 2. view at source ↗

**Figure 5.** Figure 5: Pareto fronts of the search space for 20 of the 21 router-design pairs in Nangate45 (NG45); SPR_BP is omitted (but can view at source ↗

**Figure 6.** Figure 6: Pareto fronts of the evolution search space for 8 of the 9 router-design pairs in SKY130HD; SPR_JPEG is omitted (but view at source ↗

**Figure 7.** Figure 7: Base CUGR uses one sparse-grid configuration, view at source ↗

read the original abstract

Modern ASIC design is becoming increasingly complex, driving up design costs while limiting productivity gains from existing EDA tools. Despite decades of progress, current tools rely on fixed heuristics and offer limited control via tool hyperparameters, requiring extensive manual tuning to achieve an acceptable quality of results (QoR). While prior work has explored learning-based optimization and design-specific hyperparameter tuning, these approaches operate within the constraints of static tool algorithm implementations and do not adapt the underlying algorithms to individual designs. To address this limitation, we introduce the concept of design-adaptive EDA tooling, in which the internal algorithms of EDA tools are automatically specialized to the characteristics of a given design. We instantiate this paradigm through GR-Evolve, a code evolution framework that leverages an agentic large language model (LLM) to iteratively modify global routing source code using QoR-driven feedback. The framework equips the LLM with persistent contextual knowledge of open-source global routers along with an integrated toolchain for QoR evaluation within the OpenROAD infrastructure. We evaluate GR-Evolve across seven benchmark designs across three technology nodes and demonstrate up to 8.72% reduction in post-detailed-routing wirelength over existing baseline routers, highlighting the potential of LLM-driven EDA code evolution for design-adaptive global routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GR-Evolve uses an LLM to edit global router code for design-specific gains, but the reported wirelength wins rest on unverified changes that could hide bugs.

read the letter

The core idea here is using an LLM agent to iteratively rewrite parts of a global router's source code, guided by QoR feedback from OpenROAD, instead of just tuning parameters or learning policies on top of a fixed algorithm. That is the actual novelty: design-adaptive tooling that changes the implementation itself across seven benchmarks and three nodes, with claims of up to 8.72% post-detailed-routing wirelength reduction over baselines. The setup includes persistent router knowledge in the LLM context and direct integration for evaluation, which is a practical step beyond abstract hyperparameter search papers. Credit to the authors for shipping a concrete pipeline rather than another simulation-only study. The integration with an open-source flow like OpenROAD makes it easier to reproduce the workflow in principle. The main soft spot is exactly the one the stress-test flags: nothing in the abstract or available description shows code review, equivalence checks, or regression suites that would catch subtle connectivity, timing, or DRC violations introduced by the LLM edits. Wirelength alone does not prove the modified router is still functionally correct, and a single undetected regression on one of the seven designs would invalidate the gain. No ablations on prompt design, no statistical significance on the improvements, and no breakdown of which code changes actually drove the results. The evaluation pipeline therefore leaves open the possibility that some reported wins are artifacts rather than real algorithmic specialization. This work is for readers already working on LLM agents for code modification in EDA or similar domains who want to see an early instantiation. It is not yet ready for practitioners to adopt the evolved routers without heavy additional validation. The idea deserves a serious referee because it opens a new direction with a working prototype; the current evidence is thin but the framing is clear enough that reviewers can ask for the missing checks on code correctness and experimental controls.

Referee Report

3 major / 2 minor

Summary. The paper introduces GR-Evolve, a code-evolution framework that uses an agentic LLM to iteratively modify the source code of open-source global routers, guided by QoR feedback within the OpenROAD toolchain. It claims this enables design-adaptive EDA tooling and reports up to 8.72% reduction in post-detailed-routing wirelength across seven benchmark designs spanning three technology nodes, relative to existing baseline routers.

Significance. If the empirical results hold after verification, the work could establish a new paradigm of LLM-driven algorithm specialization in EDA, moving beyond static heuristics and hyperparameter tuning to per-design code adaptation. The multi-design, multi-node evaluation provides a concrete starting point for assessing the practicality of this approach in global routing.

major comments (3)

[Evaluation] The central claim of up to 8.72% wirelength improvement rests on the unverified assumption that every LLM-proposed edit produces functionally correct routing code. No evidence of equivalence checking, code review, or regression suites beyond the primary wirelength metric is supplied; if any of the seven designs contains a latent connectivity, timing, or DRC violation introduced by the agent, the reported QoR gain is invalid.
[Evaluation] The evaluation reports results across seven designs and three nodes but provides no information on baseline router implementations, statistical significance testing, ablation studies isolating the LLM evolution components, or controls for confounding factors in the QoR measurement pipeline (e.g., OpenROAD version, detailed router settings, or runtime limits).
[Method] The framework description does not specify how the LLM's persistent contextual knowledge of open-source routers is constructed or maintained, nor does it address the risk that prompt design (a free parameter) could lead to non-reproducible or overfitted modifications.

minor comments (2)

[Introduction] The abstract and introduction could more clearly distinguish the proposed design-adaptive paradigm from prior learning-based hyperparameter tuning work.
[Evaluation] Figure captions and table headers should explicitly state the exact wirelength metric (e.g., post-detailed-routing total wirelength) and the precise baseline router versions used for each comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of verification, evaluation rigor, and methodological clarity that we will address in the revision. Below we respond point by point.

read point-by-point responses

Referee: [Evaluation] The central claim of up to 8.72% wirelength improvement rests on the unverified assumption that every LLM-proposed edit produces functionally correct routing code. No evidence of equivalence checking, code review, or regression suites beyond the primary wirelength metric is supplied; if any of the seven designs contains a latent connectivity, timing, or DRC violation introduced by the agent, the reported QoR gain is invalid.

Authors: We agree that explicit documentation of functional correctness is essential. All reported results were obtained by executing the full OpenROAD global-plus-detailed routing flow, which enforces DRC and timing checks; any edit producing violations would have caused the flow to fail or report errors, and such cases were discarded. However, the manuscript did not describe this process or additional checks. In the revised version we will add a dedicated verification subsection that details: (1) post-evolution manual review of the principal code changes, (2) execution of available regression tests on the modified routers, and (3) confirmation that every reported QoR number corresponds to a run with zero DRC violations and satisfied timing constraints. This will directly substantiate that the observed wirelength gains are not artifacts of invalid routing solutions. revision: yes
Referee: [Evaluation] The evaluation reports results across seven designs and three nodes but provides no information on baseline router implementations, statistical significance testing, ablation studies isolating the LLM evolution components, or controls for confounding factors in the QoR measurement pipeline (e.g., OpenROAD version, detailed router settings, or runtime limits).

Authors: We concur that the current evaluation description is insufficiently detailed. The revised manuscript will expand the experimental section to include: (1) precise specifications of the baseline router implementations (OpenROAD commit hashes, configuration files, and command-line settings), (2) statistical significance testing (e.g., paired t-tests or Wilcoxon tests across repeated runs with different random seeds where applicable), (3) ablation studies that isolate the LLM-driven code-evolution component from other factors, and (4) explicit controls for confounding variables such as fixed OpenROAD version, detailed-router parameters, and runtime budgets. These additions will allow readers to assess the robustness of the reported improvements. revision: yes
Referee: [Method] The framework description does not specify how the LLM's persistent contextual knowledge of open-source routers is constructed or maintained, nor does it address the risk that prompt design (a free parameter) could lead to non-reproducible or overfitted modifications.

Authors: We will substantially expand the method section to describe the construction and maintenance of the LLM's persistent contextual knowledge, including the initial seeding with router source code, documentation excerpts, and API references, as well as how this context is updated across iterations. To mitigate concerns about prompt design, we will: (1) release the exact prompts used in all experiments, (2) discuss the prompt-engineering process and its rationale, and (3) present sensitivity results obtained with alternative prompt formulations. While prompt choice is an inherent hyperparameter of LLM-based methods, these disclosures will improve reproducibility and allow assessment of potential overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks.

full rationale

The paper introduces GR-Evolve as an LLM-based framework for evolving global routing code and reports empirical QoR improvements (up to 8.72% wirelength reduction) measured against independent baseline routers on seven external benchmark designs across technology nodes. No equations, derivations, fitted parameters, or self-referential definitions appear in the provided text. The central result is not obtained by construction from the method's own outputs or prior self-citations; it depends on external evaluation infrastructure (OpenROAD) and baseline comparisons. Self-citations, if present, are not load-bearing for the headline claim. This is the expected non-finding for an empirical systems paper whose validity hinges on experimental reproducibility rather than internal mathematical closure.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the unverified premise that current LLMs possess sufficient domain understanding to edit routing algorithms productively and that QoR-driven selection will converge to functionally correct improvements.

free parameters (1)

LLM context and prompt design
The persistent knowledge provided to the agent and the exact feedback loop structure are chosen by the authors and directly affect evolution success.

axioms (1)

domain assumption LLM agents can understand and safely modify production-grade global routing source code
Invoked when the framework equips the LLM with router context and expects iterative code changes to improve QoR.

invented entities (1)

design-adaptive EDA tooling no independent evidence
purpose: To automatically specialize internal EDA algorithms to individual designs instead of using fixed implementations
New concept introduced in the abstract as the motivating paradigm.

pith-pipeline@v0.9.0 · 5522 in / 1307 out tokens · 22324 ms · 2026-05-08T09:42:53.635604+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 6 canonical work pages · 3 internal anchors

[1]

FastRoute: An efficient and high-quality global router,

M. Pan, Y. Xu, Y. Zhang, and C. Chu, “FastRoute: An efficient and high-quality global router, ”VLSI Design, vol. 2012, no. 1, p. 608362, 2012

2012
[2]

CUGR: Detailed-routability-driven 3D global routing with probabilistic resource model,

J. Liu, C.-W. Pui, F. Wang, and E. F. Young, “CUGR: Detailed-routability-driven 3D global routing with probabilistic resource model, ” inProc. DAC, 2020

2020
[3]

SPRoute 2.0: A detailed- routability-driven deterministic parallel global router with soft capacity,

J. He, U. Agarwal, Y. Yang, R. Manohar, and K. Pingali, “SPRoute 2.0: A detailed- routability-driven deterministic parallel global router with soft capacity, ” inProc. ASP-DAC, 2022

2022
[4]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herb...

work page internal anchor Pith review arXiv 2021
[5]

Claude Code

Anthropic, “Claude Code. ” https://github.com/anthropics/claude-code, 2023

2023
[6]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

A. Novikov, N. V˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shi- robokov, B. Kozlovskii, F. J. Ruiz, A. Mehrabian,et al., “AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery, ”arXiv preprint arXiv:2506.13131, 2025

work page internal anchor Pith review arXiv 2025
[7]

Automated QoR improvement in OpenROAD with coding agents,

A. Ghose, J. Jang, A. B. Kahng, and J. Lee, “Automated QoR improvement in OpenROAD with coding agents, ”arXiv preprint arXiv:2601.06268, 2026

work page arXiv 2026
[8]

Autonomous code evolution meets np-completeness

C. Yu, R. Liang, C.-T. Ho, and H. Ren, “Autonomous Code Evolution Meets NP-Completeness, ”arXiv preprint arXiv:2509.07367, 2025

work page arXiv 2025
[9]

Invited: Toward an open-source digital flow: First learnings from the openroad project,

T. Ajayi, V. A. Chhabria, M. Fogaça, S. Hashemi, A. Hosny, A. B. Kahng, M. Kim, J. Lee, U. Mallappa, M. Neseem,et al., “Invited: Toward an open-source digital flow: First learnings from the openroad project, ” inProc. DAC, 2019

2019
[10]

GR-Evolve

T. Jafri and V. A. Chhabria, “GR-Evolve. ” https://github.com/ASU-VDA-Lab/GR- Evolve, 2026

2026
[11]

ReAct: Synergizing Reasoning and Acting in Language Models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “ReAct: Synergizing Reasoning and Acting in Language Models, ” inProc. ICLR, 2022

2022
[12]

OpenEvolve: An open-source evolutionary coding agent,

A. Sharma, “OpenEvolve: An open-source evolutionary coding agent, ” 2025

2025
[13]

GPT-4 Technical Report

OpenAI, “GPT-4 technical report, ”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review arXiv 2023
[14]

The Claude 3 Model Family: Opus, Sonnet, Haiku,

Anthropic, “The Claude 3 Model Family: Opus, Sonnet, Haiku, ” 2024

2024
[15]

Gemini CLI: An open-source AI agent for the terminal

Google DeepMind, “Gemini CLI: An open-source AI agent for the terminal. ” https://github.com/google-gemini/gemini-cli, 2025. Open-source terminal AI agent (Apache 2.0). Released June 2025

2025
[16]

Codex: Lightweight coding agent

OpenAI, “Codex: Lightweight coding agent. ” https://github.com/openai/codex,
[17]

Released April 2025 (Apache 2.0)

Open-source CLI coding agent. Released April 2025 (Apache 2.0)

2025
[18]

ORFS-agent: Tool-Using Agents for Chip Design Optimization,

A. Ghose, A. B. Kahng, S. Kundu, and Z. Wang, “ORFS-agent: Tool-Using Agents for Chip Design Optimization, ” inProc. MLCAD, 2025

2025
[19]

OpenROAD Agent: An Intelligent Self-Correcting Script Generator for OpenROAD,

B.-Y. Wu, U. Sharma, A. Rovinski, and V. A. Chhabria, “OpenROAD Agent: An Intelligent Self-Correcting Script Generator for OpenROAD, ” inProc. ICLAD, 2025

2025
[20]

OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks,

U. Sharma, B.-Y. Wu, S. R. D. Kankipati, V. A. Chhabria, and A. Rovinski, “OpenROAD-Assistant: An Open-Source Large Language Model for Physical Design Tasks, ” inProc. MLCAD, 2024

2024
[21]

Invited: Agentic ai for physical design R&D: Status and prospects,

A. Ghose, A. B. Kahng, S. Kundu, and B. Pramanik, “Invited: Agentic ai for physical design R&D: Status and prospects, ” inProc. ISPD, 2026

2026
[22]

Focus session: Large language models in physical design: From data generation to intelligent agents,

B.-Y. Wu, A. Dey, A. Rovinski, and V. Chhabria, “Focus session: Large language models in physical design: From data generation to intelligent agents, ” inProc. DATE, 2026

2026
[23]

Long-context llms struggle with long in-context learning.arXiv preprint arXiv:2404.02060, 2024

T. Li, G. Zhang, Q. D. Do, X. Yue, and W. Chen, “Long-context LLMs struggle with long in-context learning, ”arXiv preprint arXiv:2404.02060, 2024

work page arXiv 2024
[24]

SkyWater SKY130 PDK

SkyWater PDK Authors, “SkyWater SKY130 PDK. ” https://github.com/google/ skywater-pdk, 2020. Accessed: 2024

2020
[25]

FreePDK: An open-source variation- aware design kit,

J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh,et al., “FreePDK: An open-source variation- aware design kit, ” inProc. ICMSE, 2007

2007
[26]

ASAP7: A 7-nm FinFET predictive process design kit,

L. T. Clark, V. Vashishtha, L. Shifren, A. Gujja, S. Sinha, B. Cline, C. Ramamurthy, and G. Yeric, “ASAP7: A 7-nm FinFET predictive process design kit, ”Microelec- tronics Journal, vol. 53, pp. 105–115, 2016

2016
[27]

OpenROAD-flow-scripts

“OpenROAD-flow-scripts. ” https://github.com/The-OpenROAD-Project/ OpenROAD-flow-scripts, 2026

2026
[28]

2019 CAD Contest: LEF/DEF based global routing,

S. Dolgov, A. Volkov, L. Wang, and B. Xu, “2019 CAD Contest: LEF/DEF based global routing, ” inProc. ICCAD, 2019

2019