arxiv: 2604.26591 · v1 · submitted 2026-04-29 · 💻 cs.CE · cs.AI

Recognition: unknown

MappingEvolve: LLM-Driven Code Evolution for Technology Mapping

Qiang Xu, Rongliang Fu, Tsung-Yi Ho, Yi Liu

Pith reviewed 2026-05-07 11:46 UTC · model grok-4.3

classification 💻 cs.CE cs.AI

keywords technology mappinglogic synthesislarge language modelscode evolutionmulti-agent systemsarea optimizationbenchmark evaluation

0 comments

The pith

Large language models can evolve the code of technology mapping algorithms to achieve better area and delay results than established tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MappingEvolve, a framework that harnesses LLMs to iteratively improve the source code responsible for technology mapping in logic synthesis flows. By decomposing the mapping task into individual optimization operators and orchestrating changes through a Planner that sets goals, an Evolver that proposes code edits, and an Evaluator that scores outcomes, the system conducts a guided search for superior implementations. This yields evolved code that reduces circuit area by 10.04 percent relative to the ABC tool and 7.93 percent relative to mockturtle across EPFL benchmarks, alongside substantial gains in a combined area-delay metric. The framework also incorporates mechanisms to navigate the inherent trade-off between circuit size and speed. A reader would care because technology mapping quality directly influences the size, power, and performance of the final hardware produced by synthesis tools.

Core claim

MappingEvolve introduces the use of LLMs to directly evolve technology mapping code rather than merely generating scripts. The method first abstracts the mapping process into distinct optimization operators. It then deploys a hierarchical agent-based architecture consisting of a Planner, an Evolver, and an Evaluator to strategically guide the evolutionary search for code modifications. Experiments demonstrate that this approach significantly outperforms both direct LLM evolution and strong baselines, delivering 10.04% area reduction versus ABC and 7.93% versus mockturtle, with 46.6% to 96.0% improvement in overall score on EPFL benchmarks while explicitly handling the area-delay trade-off.

What carries the argument

The hierarchical agent-based architecture (Planner, Evolver, Evaluator) that directs LLM modifications to abstracted optimization operators in technology mapping code.

Load-bearing premise

LLM-suggested modifications to the mapping code always preserve functional correctness and do not introduce subtle bugs that only appear on certain inputs.

What would settle it

Demonstrating that an evolved mapping implementation either produces logically incorrect results for some circuit or fails to deliver area or delay improvements when applied to a new benchmark set not involved in the evolution process.

Figures

Figures reproduced from arXiv: 2604.26591 by Qiang Xu, Rongliang Fu, Tsung-Yi Ho, Yi Liu.

**Figure 1.** Figure 1: The overall flow of MappingEvolve. After each round, the algorithm updates the mapping coverage and computes required arrival times to guide subsequent rounds (Lines 10-11). 2.3 LLM-driven Evolution Target Selection The three operators identified above, O ={MatchPhase, MatchPhaseExact, MatchDropPhase}, encapsulate the core optimization logic of technology mapping algorithms. We select these operators as i… view at source ↗

read the original abstract

Technology mapping is a critical yet challenging stage in logic synthesis. While Large Language Models (LLMs) have been applied to generate optimization scripts, their potential for core algorithm enhancement remains untapped. We introduce MappingEvolve, an open-source framework that pioneers the use of LLMs to directly evolve technology mapping code. Our method abstracts the mapping process into distinct optimization operators and employs a hierarchical agent-based architecture, comprising a Planner, Evolver, and Evaluator, to guide the evolutionary search. This structured approach enables strategic and effective code modifications. Experiments show our method significantly outperforms direct evolution and strong baselines, achieving 10.04\% area reduction versus ABC and 7.93\% versus mockturtle, with 46.6\%--96.0\% $S_{overall}$ improvement on EPFL benchmarks, while explicitly navigating the area--delay trade-off. Our code and data are available at https://github.com/Flians/MappingEvolve.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gets LLMs to evolve the core technology mapping code via a planner-evolver-evaluator setup and reports area gains on EPFL benchmarks, but the functional equivalence of those changes is not shown to be rigorously enforced.

read the letter

The punchline is that MappingEvolve applies LLMs to rewrite the technology mapping algorithm itself rather than just generating scripts. They break the mapper into abstracted operators, then run a three-agent loop where a planner picks what to change, an evolver writes the new code, and an evaluator scores area and delay. This produces the claimed 10% area cut versus ABC and 8% versus mockturtle on the public EPFL set while also trading off delay. The code is released, which is useful for anyone who wants to inspect or rerun it.

Referee Report

3 major / 2 minor

Summary. The paper introduces MappingEvolve, an open-source framework that uses LLMs in a hierarchical agent architecture (Planner, Evolver, Evaluator) to evolve technology mapping code for logic synthesis. It abstracts mapping into optimization operators and claims that the evolved code significantly outperforms direct evolution and baselines, delivering 10.04% area reduction versus ABC and 7.93% versus mockturtle, along with 46.6%–96.0% S_overall improvement on EPFL benchmarks while navigating the area-delay trade-off.

Significance. If the central performance claims hold after verification, the work would be significant for demonstrating a structured LLM-driven approach to core algorithm evolution in technology mapping, an area where prior LLM uses have been limited to script generation. The open-source release and explicit handling of trade-offs are strengths that could enable follow-on research in LLM-augmented EDA tools.

major comments (3)

[Abstract and Experiments section] Abstract and Experiments section: The central claims of 10.04% area reduction versus ABC and 7.93% versus mockturtle rest on the assumption that LLM-evolved mapping operators produce functionally equivalent netlists. No description is given of how equivalence is enforced (e.g., via structural hashing plus SAT-based CEC versus simulation on EPFL vectors only). Without this, reported gains could arise from logic-altering simplifications that fail on unseen designs.
[Experiments section] Experiments section: The abstract states clear percentage gains yet supplies no experimental protocol, statistical tests, baseline code versions, controls for data leakage, or post-hoc selection criteria. This leaves the outperformance claim weakly supported and prevents assessment of whether improvements are robust or reproducible.
[Evaluator description (hierarchical architecture)] Evaluator description (hierarchical architecture): The burden of ensuring functional correctness is placed on the Evaluator, but the manuscript provides no evidence that it performs formal verification rather than trusting LLM-generated code or single-run mapping scores. This assumption underpins both the area/delay figures and the S_overall metric.

minor comments (2)

[Abstract] The notation S_overall is introduced without an explicit equation or definition in the abstract; a clear formula should be added.
[Abstract] The GitHub link is provided but the manuscript does not specify which commit or release corresponds to the reported results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which highlight important aspects of verification, experimental rigor, and the Evaluator component. We agree that these areas require clarification and expansion to strengthen the manuscript. We will make revisions to address each point, as detailed below.

read point-by-point responses

Referee: [Abstract and Experiments section] Abstract and Experiments section: The central claims of 10.04% area reduction versus ABC and 7.93% versus mockturtle rest on the assumption that LLM-evolved mapping operators produce functionally equivalent netlists. No description is given of how equivalence is enforced (e.g., via structural hashing plus SAT-based CEC versus simulation on EPFL vectors only). Without this, reported gains could arise from logic-altering simplifications that fail on unseen designs.

Authors: We acknowledge that the manuscript does not provide an explicit description of the equivalence enforcement mechanism. In the implemented Evaluator, functional equivalence is checked via structural hashing combined with simulation on the provided EPFL benchmark vectors prior to accepting any evolved operator for scoring. However, we agree this is insufficiently documented and could leave open the possibility of non-equivalent simplifications. We will add a new subsection in the Experiments section detailing the verification procedure, including the exact methods used, any limitations with respect to unseen designs, and plans to incorporate SAT-based CEC in future iterations. revision: yes
Referee: [Experiments section] Experiments section: The abstract states clear percentage gains yet supplies no experimental protocol, statistical tests, baseline code versions, controls for data leakage, or post-hoc selection criteria. This leaves the outperformance claim weakly supported and prevents assessment of whether improvements are robust or reproducible.

Authors: We concur that the current Experiments section lacks sufficient detail on the protocol, which weakens the support for the reported gains. We will substantially expand this section to include: the complete evolutionary run protocol (number of iterations, population sizes, LLM prompts used); statistical reporting with means, standard deviations, and significance tests across multiple independent runs; exact versions and configurations of ABC and mockturtle baselines; explicit controls for data leakage (fixed benchmark splits with no overlap between evolution and evaluation sets); and post-hoc selection criteria for the final reported operators. These additions will enable full reproducibility and allow readers to assess robustness. revision: yes
Referee: [Evaluator description (hierarchical architecture)] Evaluator description (hierarchical architecture): The burden of ensuring functional correctness is placed on the Evaluator, but the manuscript provides no evidence that it performs formal verification rather than trusting LLM-generated code or single-run mapping scores. This assumption underpins both the area/delay figures and the S_overall metric.

Authors: The manuscript describes the Evaluator's role in scoring but does not supply concrete evidence or implementation details confirming formal verification steps beyond single-run mapping. We will revise the hierarchical architecture description to explicitly state the verification steps performed by the Evaluator (cross-referencing the new verification subsection) and include experimental evidence that all reported area/delay and S_overall results were obtained only after passing equivalence checks. Where formal methods such as SAT-based CEC were not applied in the current experiments, we will note this limitation transparently and discuss its implications for the claims. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks

full rationale

The paper presents an empirical LLM-based framework (Planner-Evolver-Evaluator) for evolving technology-mapping operators and reports area/delay gains on public EPFL benchmarks versus independently developed external tools (ABC, mockturtle). No mathematical derivation, fitted-parameter prediction, or self-referential equation chain is described; performance figures are obtained by direct comparison to outside references rather than by construction from quantities internal to the method. The architecture and results are therefore self-contained against independent evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract presents an empirical framework without stating mathematical axioms, fitted constants, or new physical entities; claims rest on LLM capabilities and agent roles taken as given.

pith-pipeline@v0.9.0 · 5466 in / 1134 out tokens · 89095 ms · 2026-05-07T11:46:42.976295+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 3 canonical work pages · 3 internal anchors

[1]

Heuristic logic resynthesis algorithms at the core of peephole optimization,

S.-Y. Lee and G. D. Micheli, “Heuristic logic resynthesis algorithms at the core of peephole optimization, ”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 42, no. 11, pp. 3958–3971, 2023

2023
[2]

Scalable logic rewriting using don’t cares,

A. T. Calvino and G. De Micheli, “Scalable logic rewriting using don’t cares, ” in IEEE/ACM Proceedings Design, Automation and Test in Eurpoe (DATE), 2024, pp. 1–6

2024
[3]

CHOP: Clustered hybrid optimization for logic synthesis with self-supervised prediction,

R. Fu, R. Zhang, Z. Zheng, Z. Shi, Y. Pu, J. Huang, B. Yu, Q. Xu, and T.-Y. Ho, “CHOP: Clustered hybrid optimization for logic synthesis with self-supervised prediction, ” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2026

2026
[4]

DCLOG: Don’t cares-based logic optimization using pre-training graph neural networks,

R. Fu, L. Shen, Z. Wang, Z. Lei, Z. Wang, J. Huang, B. Yu, and T.-Y. Ho, “DCLOG: Don’t cares-based logic optimization using pre-training graph neural networks, ” inIEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), 2026, pp. 793–799

2026
[5]

eLogic: A e- graph-based logic rewriting framework for majority-inverter graphs,

R. Fu, W. Xuan, S. Yin, G. Hu, C. Chen, H. Zhang, B. Yu, and T.-Y. Ho, “eLogic: A e- graph-based logic rewriting framework for majority-inverter graphs, ” inIEEE/ACM Proceedings Design, Automation and Test in Eurpoe (DATE), 2026, pp. 1–6

2026
[6]

ABC: A system for sequen- tial logic synthesis and verification,

Berkeley Logic Synthesis and Verification Group, “ABC: A system for sequen- tial logic synthesis and verification, ” http://www.eecs.berkeley.edu/ alanmi/abc/, Version 1.01
[7]

A versatile mapping approach for technology mapping and graph optimization,

A. T. Calvino, H. Riener, S. Rai, A. Kumar, and G. De Micheli, “A versatile mapping approach for technology mapping and graph optimization, ” inIEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), 2022, pp. 410–416

2022
[8]

Combinational and sequential mapping with priority cuts,

A. Mishchenko, Sungmin Cho, Satrajit Chatterjee, and R. Brayton, “Combinational and sequential mapping with priority cuts, ” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2007, pp. 354–361

2007
[9]

SLAP: A supervised learning approach for priority cuts technology mapping,

W. L. Neto, M. T. Moreira, Y. Li, L. Amaru, C. Yu, and P. E. Gaillardon, “SLAP: A supervised learning approach for priority cuts technology mapping, ” inACM/IEEE Design Automation Conference (DAC), vol. 2021-December, 2021, pp. 859–864

2021
[10]

LEAP: Learning guided quality cut selection for faster technology mapping,

C. R. Chigarapally, H. N. Bhakkad, A. B. Chowdhury, C. Karfa, and S. Bhattacharjee, “LEAP: Learning guided quality cut selection for faster technology mapping, ” in IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2025

2025
[11]

Physi- cally aware synthesis revisited: Guiding technology mapping with primitive logic gate placement,

H. Pan, C. Lan, Y. Liu, Z. Wang, L. Shang, X. Zeng, F. Yang, and K. Zhu, “Physi- cally aware synthesis revisited: Guiding technology mapping with primitive logic gate placement, ” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2025

2025
[12]

Technology mapping using multi-output library cells,

A. T. Calvino and G. De Micheli, “Technology mapping using multi-output library cells, ” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023, pp. 1–9

2023
[13]

Novel fpga technology mapping for dual-output luts: Methodology and application,

L. Shang, S. Lu, S. Jung, Q. Liang, and C. Pan, “Novel fpga technology mapping for dual-output luts: Methodology and application, ”IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems (TCAD), pp. 1–1, 2025

2025
[14]

MapTune: Versatile ASIC technology mapping via reinforcement learning guided library tuning,

M. Liu, D. Robinson, Y. Li, J. Maximilian Kuehn, R. Liang, H. Ren, and C. Yu, “MapTune: Versatile ASIC technology mapping via reinforcement learning guided library tuning, ”ACM Transactions on Design Automation of Electronic Systems (TODAES), 2025

2025
[15]

TeMACLE: A technology mapping-aware area-efficient standard cell library extension framework,

R. Fu, C. Wang, B. Yu, and T.-Y. Ho, “TeMACLE: A technology mapping-aware area-efficient standard cell library extension framework, ”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 44, no. 8, pp. 3034–3045, 2025

2025
[16]

Introducing gpt-5,

OpenAI, “Introducing gpt-5, ” 2025. [Online]. Available: https://openai.com/index/ introducing-gpt-5

2025
[17]

DeepSeek-V3 Technical Report

DeepSeek-AI, “DeepSeek-V3 technical report, ” 2024. [Online]. Available: https: //arxiv.org/abs/2412.19437

work page internal anchor Pith review arXiv 2024
[18]

Qwen3 Technical Report

Q. Team, “Qwen3 technical report, ” 2025. [Online]. Available: https://arxiv.org/ abs/2505.09388

work page internal anchor Pith review arXiv 2025
[19]

ChatLS: Multimodal retrieval-augmented generation and chain-of-thought for logic synthesis script customization,

H. Zheng, H. Wu, and Z. He, “ChatLS: Multimodal retrieval-augmented generation and chain-of-thought for logic synthesis script customization, ” inACM/IEEE Design Automation Conference (DAC), 2025, pp. 1–7

2025
[20]

LLSM: LLM-enhanced logic synthesis model with EDA-guided CoT prompting, hybrid embedding and AIG-tailored acceleration,

S. Huang, J. Li, Z. Yu, J. Ye, J. Xu, N. Xu, and G. Dai, “LLSM: LLM-enhanced logic synthesis model with EDA-guided CoT prompting, hybrid embedding and AIG-tailored acceleration, ” inIEEE/ACM Asia and South Pacific Design Automation Conference (ASPDAC), 2025, p. 974–980

2025
[21]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

A. Novikov, N. V ˜u, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog, “AlphaEvolve: A coding agent for scientific and algorithmic discovery, ” 2025. [Online]. Available: https://arxiv.org/abs/2506.13131

work page internal anchor Pith review arXiv 2025
[22]

OpenEvolve: an open-source evolutionary coding agent,

A. Sharma, “OpenEvolve: an open-source evolutionary coding agent, ” 2025. [Online]. Available: https://github.com/algorithmicsuperintelligence/openevolve

2025
[23]

Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering,

M. C. Hansen, H. Yalcin, and J. P. Hayes, “Unveiling the ISCAS-85 benchmarks: A case study in reverse engineering, ”IEEE Design & Test, vol. 16, no. 3, pp. 72–80, 1999

1999
[24]

mockturtle: A C++ logic network library,

EPFL Integrated Systems Laboratory, “mockturtle: A C++ logic network library, ” https://github.com/lsils/mockturtle, Accessed on November 2025

2025
[25]

The EPFL combinational benchmark suite,

L. Amarù, P.-E. Gaillardon, and G. De Micheli, “The EPFL combinational benchmark suite, ” inIEEE/ACM International Workshop on Logic Synthesis, 2015

2015
[26]

A catalog of three-variable or-invert and and-invert logical circuits,

L. Hellerman, “A catalog of three-variable or-invert and and-invert logical circuits, ” IEEE Transactions on Electronic Computers, vol. EC-12, no. 3, pp. 198–223, 1963

1963
[27]

ASAP7 predictive design kit devel- opment and cell design technology co-optimization: Invited paper,

V. Vashishtha, M. Vangala, and L. T. Clark, “ASAP7 predictive design kit devel- opment and cell design technology co-optimization: Invited paper, ” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 992–998

2017