Evidence-Driven LLM Agent for C-to-Synthesizable-C Conversion and Verification

Hongbing Lang; John Imoleayo Adebisi; Luke Ztz Hu; Songping Mai; Zhe Zhao; Zhihan Xiao

arxiv: 2606.28409 · v1 · pith:OOLHUCYRnew · submitted 2026-06-25 · 💻 cs.AR · cs.AI

Evidence-Driven LLM Agent for C-to-Synthesizable-C Conversion and Verification

Zhe Zhao , Hongbing Lang , Zhihan Xiao , Luke Ztz Hu , John Imoleayo Adebisi , Songping Mai This is my paper

Pith reviewed 2026-06-30 01:40 UTC · model grok-4.3

classification 💻 cs.AR cs.AI

keywords LLM agenthigh-level synthesisC to HLS-C conversioncode repairmismatch localizationverification workflowevidence RAG

0 comments

The pith

An evidence-isolated four-stage verifier and mismatch localization chain lets LLM agents convert ordinary C into synthesizable HLS-C more reliably than prior models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames C-to-HLS-C conversion as a closed-loop generation-verification-diagnosis-repair task on a full HLS toolchain. It introduces a multi-agent workflow that keeps each stage of the Xilinx Vitis pipeline (compilation, C simulation, synthesis, C/RTL co-simulation) under strict evidence isolation so agents receive only normalized diagnostic signals rather than raw logs. Two supporting mechanisms localize mismatches via log normalization, AST backward slicing and dual-trace instrumentation, then retrieve typed repair evidence from a self-evolving card pool. If the workflow holds, it removes the brittleness that has limited earlier LLM repair systems to the first pipeline stages.

Core claim

The end-to-end workflow of cooperating agents closed by the four-stage verifier under strict evidence isolation, together with the Progressive Mismatch Localization Chain and typed-query two-stage evidence RAG backed by a self-evolving family-routed repair-card pool, produces C programs that complete the full HLS pipeline where previous LLM systems do not.

What carries the argument

The four-stage verifier under strict evidence isolation, which supplies only normalized diagnostic signals to the agents instead of raw tool logs, closing the generation-verification-diagnosis-repair loop.

If this is right

The workflow covers all four HLS pipeline stages rather than stopping at C compilation or CSim.
Progressive mismatch localization reduces the search space for repairs by identifying exact source locations of CSim and CoSim failures.
The self-evolving repair-card pool accumulates evidence across runs and routes it by code family.
Strict evidence isolation prevents the model from seeing unprocessed tool output and thereby improves reproducibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same isolation pattern could be applied to other synthesis or compilation toolchains beyond Xilinx Vitis.
If the repair-card pool is seeded with human-written fixes, the method might bootstrap from fewer LLM calls.
Dual-trace instrumentation inside PMLC could be reused for debugging ordinary software mismatches unrelated to hardware synthesis.

Load-bearing premise

The four-stage verifier under strict evidence isolation supplies sufficient, unbiased diagnostic signals for the agents to converge on correct repairs without the evaluation becoming circular or overly optimistic.

What would settle it

A benchmark of C programs on which the agent workflow reports successful CoSim passes yet the generated RTL fails timing or resource checks when placed on the target FPGA.

Figures

Figures reproduced from arXiv: 2606.28409 by Hongbing Lang, John Imoleayo Adebisi, Luke Ztz Hu, Songping Mai, Zhe Zhao, Zhihan Xiao.

**Figure 2.** Figure 2: C-to-HLS-C closed loop. The planner projects an interface-lossless brief [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: PMLC three-layer evidence chain. L1 normalizes the mismatch log into a structured failure object; L2 runs an AST backward slice (depth ≤ 2) to a key-variable map and merged suspect lines; L3 aligns the C golden trace with the HLS-C trace to localize the first-divergence cycle. The layers shrink the search space monotonically, Lines(L3) ⊆ Lines(L2) ⊆ Lines(L1) ⊆ Lines(y (k) i ) (Proposition 1). In-figure nu… view at source ↗

**Figure 5.** Figure 5: Information stratification between prompt-facing ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 4.** Figure 4: Self-evolving MoE RAG. A typed query qr over the fields stage, family, symbols, intent, and preserve hard-routes to one expert bucket (Stage 1) and returns up to khint ≤3 exact-matched repair cards, or an empty hit on any miss (Stage 2). Offline, human-audited repair chains (no reference HLS, testbench labels, or manual repairs) are promoted into the buckets while audit-failed cases enter the audit ledger,… view at source ↗

read the original abstract

Software-compilable C programs routinely fail to complete the four-stage pipeline of a high-level synthesis (HLS) toolchain -- compilation, C simulation (CSim), synthesis, and C/RTL co-simulation (CoSim) -- because HLS accepts only a synthesizable subset of C (HLS-C). Yet most existing large language model (LLM) systems built for HLS code repair only cover the early pipeline stages and feed raw tool logs directly to the model, yielding brittle and hard-to-reproduce fixes. We formulate C-to-HLS-C conversion as a closed-loop generation-verification-diagnosis-repair problem on an HLS tool (Xilinx Vitis), contributing three components: an end-to-end workflow of cooperating agents closed by the four-stage verifier under strict evidence isolation; a Progressive Mismatch Localization Chain (PMLC) that localizes CSim/CoSim mismatches through log normalization, AST backward slicing, and dual-trace instrumentation; and a typed-query, two-stage evidence RAG backed by a self-evolving, family-routed repair-card pool. Experimental results show that the proposed workflow substantially outperforms all comparable state-of-the-art models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper outlines a multi-agent HLS repair system with PMLC localization and typed RAG repair cards, but the abstract contains no numbers or baselines so the outperformance claim cannot be checked.

read the letter

The paper frames C-to-HLS-C conversion as a closed-loop agent task on Xilinx Vitis and introduces three pieces: a four-stage verifier with evidence isolation, the Progressive Mismatch Localization Chain that uses log normalization plus AST backward slicing, and a typed-query two-stage RAG backed by a self-evolving family-routed repair-card pool. These components target the specific pain points of brittle log feeding and incomplete pipeline coverage that earlier LLM HLS tools left open.

The architecture looks like a reasonable engineering response to the domain. Strict evidence isolation and family routing are explicit attempts to keep diagnostic signals clean, and the PMLC steps (normalization, slicing, dual-trace) are concrete rather than hand-wavy. That is the main concrete contribution visible from the description.

The obvious gap is that the abstract asserts substantial outperformance over state-of-the-art models yet supplies no success rates, dataset size, baseline names, or statistical detail. Without those numbers the central claim stays unevaluated, and it is impossible to tell whether the isolation actually prevents circular feedback or whether the repair cards generalize. The stress-test note correctly flags that the stated design tries to address circularity, but the absence of results still leaves the claim untestable.

This work is aimed at the narrow HLS and EDA community that already deals with Vitis pipeline failures. A reader already working on LLM agents for hardware design might pick up the PMLC and repair-pool ideas as implementation patterns, but only after seeing the experiments. The paper shows coherent problem decomposition and does not contain internal contradictions, so it is worth sending to referees who can demand the missing data and reproducibility artifacts.

Referee Report

1 major / 0 minor

Summary. The manuscript formulates C-to-HLS-C conversion as a closed-loop generation-verification-diagnosis-repair problem on Xilinx Vitis. It contributes an end-to-end multi-agent workflow closed by a four-stage verifier under strict evidence isolation, a Progressive Mismatch Localization Chain (PMLC) that combines log normalization, AST backward slicing, and dual-trace instrumentation, and a typed-query two-stage evidence RAG backed by a self-evolving family-routed repair-card pool. The central claim is that this workflow substantially outperforms comparable state-of-the-art models.

Significance. If the experimental results hold, the work could advance automated repair for the full HLS pipeline by replacing raw-log feedback with isolated, structured diagnostic signals. The explicit use of evidence isolation and family-routed routing directly targets circularity risks in agent feedback loops, which is a constructive architectural choice. However, the absence of any quantitative results, baselines, dataset size, or statistical tests in the provided text prevents assessment of whether these components deliver the claimed gains.

major comments (1)

[Abstract] Abstract: the claim that 'Experimental results show that the proposed workflow substantially outperforms all comparable state-of-the-art models' is unsupported by any numbers, baselines, dataset size, or statistical tests, so the central empirical contribution cannot be evaluated from the given text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying the need for concrete evidence to support the abstract's central claim. We address this point directly below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'Experimental results show that the proposed workflow substantially outperforms all comparable state-of-the-art models' is unsupported by any numbers, baselines, dataset size, or statistical tests, so the central empirical contribution cannot be evaluated from the given text.

Authors: We agree that the abstract's empirical claim requires supporting detail for evaluation. The full manuscript contains an Experiments section that reports results on a benchmark suite of C programs, including direct comparisons against prior LLM-based HLS repair baselines, dataset cardinality, pass rates through the four-stage Vitis pipeline, and statistical significance. To make this evidence visible at the abstract level, we will revise the abstract to include the key quantitative outcomes (success rates, relative improvements, and dataset size) while preserving its length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes an empirical LLM-agent workflow for HLS code repair, with components including a four-stage verifier under evidence isolation, PMLC for mismatch localization, and typed-query RAG. No equations, fitted parameters, or derivation chains appear in the provided abstract or claimed architecture. The central outperformance claim rests on experimental results rather than any self-definitional reduction, fitted-input prediction, or self-citation load-bearing step. The explicit mention of strict evidence isolation directly addresses potential circularity risks in the closed-loop process, leaving the evaluation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

No mathematical free parameters or axioms are described. The work rests on domain assumptions about HLS tool behavior and the utility of LLM agents for code repair; two new system entities are introduced without independent evidence outside the paper.

invented entities (2)

Progressive Mismatch Localization Chain (PMLC) no independent evidence
purpose: localizes CSim/CoSim mismatches through log normalization, AST backward slicing, and dual-trace instrumentation
New diagnostic component introduced to support the closed-loop repair process
typed-query two-stage evidence RAG with self-evolving repair-card pool no independent evidence
purpose: provides repair suggestions to the agents
New retrieval mechanism backed by a family-routed card database

pith-pipeline@v0.9.1-grok · 5748 in / 1145 out tokens · 30481 ms · 2026-06-30T01:40:47.700173+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 12 canonical work pages · 6 internal anchors

[1]

A survey and evaluation of fpga high-level synthesis tools,

R. Nane, V .-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y . T. Chen, H. Hsiao, S. Brown, F. Ferrandi et al. , “A survey and evaluation of fpga high-level synthesis tools,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 35, no. 10, pp. 1591– 1604, 2015

2015
[2]

Are we there yet? a study on the state of high-level synthesis,

S. Lahti, P . Sjövall, J. V anne, and T. D. Hämäläinen, “Are we there yet? a study on the state of high-level synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 38, no. 5, pp. 898–911, 2018

2018
[3]

Fpga hls today: successes, challenges, and opportunities,

J. Cong, J. Lau, G. Liu, S. Neuendorffer, P . Pan, K. Vissers, and Z. Zhang, “Fpga hls today: successes, challenges, and opportunities,” ACM Transactions on Reconﬁgurable Technology and Systems (TRETS) , vol. 15, no. 4, pp. 1–42, 2022

2022
[4]

StarCoder: may the source be with you!

R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al. , “Starcoder: may the source be with you!” arXiv preprint arXiv:2305.06161 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

V erigen: A large language model for verilog code generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “V erigen: A large language model for verilog code generation,” ACM Transactions on Design Automation of Electronic Systems , vol. 29, no. 3, pp. 1–31, 2024

2024
[6]

V erilogeval: Evaluating large language models for verilog code generation,

M. Liu, N. Pinckney, B. Khailany, and H. Ren, “V erilogeval: Evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) . IEEE, 2023, pp. 1–8

2023
[7]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Paciﬁc Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727

2024
[8]

Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,

S. Liu, W. Fang, Y . Lu, Q. Zhang, H. Zhang, and Z. Xie, “Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,” in 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, 2024, pp. 1–5

2024
[9]

Rtlﬁxer: Automatically ﬁxing rtl syntax errors with large language model,

Y . Tsai, M. Liu, and H. Ren, “Rtlﬁxer: Automatically ﬁxing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , 2024, pp. 1–6

2024
[10]

C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,

L. Collini, S. Garg, and R. Karri, “C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,” ACM Transac- tions on Design Automation of Electronic Systems , vol. 30, no. 6, pp. 1–24, 2025

2025
[11]

Hlstrans: Dataset for c-to-hls hardware code synthesis,

Q. Zou, N. Chen, Y . Chen, B. He, and W. Wong, “Hlstrans: Dataset for c-to-hls hardware code synthesis,” arXiv preprint arXiv:2507.04315 , 2025

work page arXiv 2025
[12]

ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis

R. Li, J. Xiong, X. He, J. Zhao, J. Lv, H. Fang, L. Qi, and X. Wang, “Chathls: Towards systematic design automation and optimization for high-level synthesis,” arXiv preprint arXiv:2507.00642 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,

S. Abi-Karam and C. Hao, “Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,” in 2025 IEEE International Conference on LLM-Aided Design (ICLAD) . IEEE, 2025, pp. 219–226

2025
[14]

High-level synthesis design space explo- ration: Past, present, and future,

B. C. Schafer and Z. Wang, “High-level synthesis design space explo- ration: Past, present, and future,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 39, no. 10, pp. 2628– 2639, 2019

2019
[15]

Bambu: an open-source research framework for the high-level synthesis of complex applica- tions,

F. Ferrandi, V . G. Castellana, S. Curzel, P . Fezzardi, M. Fiorito, M. Lat- tuada, M. Minutoli, C. Pilato, and A. Tumeo, “Bambu: an open-source research framework for the high-level synthesis of complex applica- tions,” in 2021 58th ACM/IEEE Design Automation Conference (DAC) . IEEE, 2021, pp. 1327–1330

2021
[16]

From c/c++ code to high- performance dataﬂow circuits,

L. Josipovi ´c, A. Guerrieri, and P . Ienne, “From c/c++ code to high- performance dataﬂow circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 41, no. 7, pp. 2142–2155, 2021. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRA TED CIRCUITS AND SYSTEMS, VOL. XX, NO. X, MONTH YEAR 10

2021
[17]

Autodse: Enabling software programmers to design efﬁcient fpga accelerators,

A. Sohrabizadeh, C. H. Y u, M. Gao, and J. Cong, “Autodse: Enabling software programmers to design efﬁcient fpga accelerators,” ACM Trans- actions on Design Automation of Electronic Systems (TODAES) , vol. 27, no. 4, pp. 1–27, 2022

2022
[18]

Chstone: A benchmark program suite for practical c-based high-level synthesis,

Y . Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, “Chstone: A benchmark program suite for practical c-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems (ISCAS) . IEEE, 2008, pp. 1192–1195

2008
[19]

Mach- suite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Mach- suite: Benchmarks for accelerator design and customized architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014, pp. 110–119

2014
[20]

Rosetta: A realistic high-level synthesis benchmark suite for software programmable fpgas,

Y . Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Feath- erston, Y .-H. Lai, G. Liu, G. A. V elasquez et al. , “Rosetta: A realistic high-level synthesis benchmark suite for software programmable fpgas,” in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays , 2018, pp. 269–278

2018
[21]

To- wards a comprehensive benchmark for high-level synthesis targeted to fpgas,

Y . Bai, A. Sohrabizadeh, Z. Qin, Z. Hu, Y . Sun, and J. Cong, “To- wards a comprehensive benchmark for high-level synthesis targeted to fpgas,” Advances in Neural Information Processing Systems , vol. 36, pp. 45 288–45 299, 2023

2023
[22]

Hlspilot: Llm-based high-level syn- thesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level syn- thesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , 2024, pp. 1–9

2024
[23]

Sage-hls: Syntax-aware ast-guided llm for high-level synthesis code generation,

M. Z. S. Khan, N. Mashnoor, M. Akyash, K. Azar, and H. Kamali, “Sage-hls: Syntax-aware ast-guided llm for high-level synthesis code generation,” in 2025 IEEE 43rd International Conference on Computer Design (ICCD) . IEEE, 2025, pp. 574–581

2025
[24]

Synthai: A multi agent generative ai framework for automated modular hls design generation,

S. A. Sheikholeslam and A. Ivanov, “Synthai: A multi agent generative ai framework for automated modular hls design generation,” arXiv preprint arXiv:2405.16072, 2024

work page arXiv 2024
[25]

Automated c/c++ program repair for high-level synthesis via large lan- guage models,

K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, “Automated c/c++ program repair for high-level synthesis via large lan- guage models,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , 2024, pp. 1–9

2024
[26]

Hlsrewriter: Efﬁcient refactoring and optimization of c/c++ code with llms for high-level synthesis,

K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, “Hlsrewriter: Efﬁcient refactoring and optimization of c/c++ code with llms for high-level synthesis,” ACM Transactions on Design Automation of Electronic Systems , vol. 31, no. 4, pp. 1–21, 2026

2026
[27]

Hlsdebugger: Identiﬁcation and cor- rection of logic bugs in hls code with llm solutions,

J. Wang, S. Liu, Y . Lu, and Z. Xie, “Hlsdebugger: Identiﬁcation and cor- rection of logic bugs in hls code with llm solutions,” in 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) . IEEE, 2025, pp. 1–9

2025
[28]

Correcthdl: Agentic hdl design with llms leveraging high-level synthesis as reference,

K. Xu, G. L. Zhang, U. Schlichtmann, and B. Li, “Correcthdl: Agentic hdl design with llms leveraging high-level synthesis as reference,” arXiv preprint arXiv:2511.16395 , 2025

work page arXiv 2025
[29]

Automatic software repair: A bibliography,

M. Monperrus, “Automatic software repair: A bibliography,” ACM Com- puting Surveys (CSUR) , vol. 51, no. 1, pp. 1–24, 2018

2018
[30]

Automated program repair in the era of large pre-trained language models,

C. S. Xia, Y . Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE, 2023, pp. 1482–1494

2023
[31]

Repair is nearly generation: Multilingual program repair with llms,

H. Joshi, J. C. Sanchez, S. Gulwani, V . Le, G. V erbruggen, and I. Radi ˇcek, “Repair is nearly generation: Multilingual program repair with llms,” in Proceedings of the AAAI Conference on Artiﬁcial Intelli- gence, vol. 37, no. 4, 2023, pp. 5131–5140

2023
[32]

Repairagent: An autonomous, llm-based agent for program repair,

I. Bouzenia, P . Devanbu, and M. Pradel, “Repairagent: An autonomous, llm-based agent for program repair,” in 2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE) . IEEE, 2025, pp. 2188–2200

2025
[33]

VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation

H. Ping, A. Bhattacharjee, P . Zhang, S. Li, W. Y ang, A. Cheng, X. Zhang, J. Thomason, A. Jannesari, N. Ahmed et al. , “V erimoa: A mixture-of-agents framework for spec-to-hdl generation,” arXiv preprint arXiv:2510.27617, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Deepv: a model-agnostic retrieval- augmented framework for verilog code generation with a high-quality knowledge base,

Z. Ibnat, P . E. Calzada, R. M. Ihtemam, S. K. Saha, J. Zhou, F. Farahmandi, and M. Tehranipoor, “Deepv: a model-agnostic retrieval- augmented framework for verilog code generation with a high-quality knowledge base,” arXiv preprint arXiv:2510.05327 , 2025

work page arXiv 2025
[35]

Lift: Llm-based pragma insertion for hls via gnn supervised ﬁne-tuning,

N. Prakriya, Z. Ding, Y . Sun, and J. Cong, “Lift: Llm-based pragma insertion for hls via gnn supervised ﬁne-tuning,” arXiv preprint arXiv:2504.21187, 2025

work page arXiv 2025
[36]

Timelyhls: Llm- based timing-aware and architecture-speciﬁc fpga hls optimization,

N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Timelyhls: Llm- based timing-aware and architecture-speciﬁc fpga hls optimization,” in 2025 IEEE International Conference on Omni-layer Intelligent Systems (COINS). IEEE, 2025, pp. 1–6

2025
[37]

Agenttts: Large language model agent for test- time compute-optimal scaling strategy in complex tasks,

F. Wang, H. Liu, Z. Dai, J. Zeng, Z. Zhang, Z. Wu, C. Luo, Z. Li, X. Tang, Q. He et al. , “Agenttts: Large language model agent for test- time compute-optimal scaling strategy in complex tasks,” Advances in Neural Information Processing Systems , vol. 38, pp. 98 396–98 433, 2026

2026
[38]

Can reasoning models reason about hardware? an agentic hls perspective,

L. Collini, A. Hennessee, R. Karri, and S. Garg, “Can reasoning models reason about hardware? an agentic hls perspective,” in 2025 IEEE In- ternational Conference on LLM-Aided Design (ICLAD) . IEEE, 2025, pp. 188–194

2025
[39]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küt- tler, M. Lewis, W.-t. Yih, T. Rocktäschel et al. , “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in neural in- formation processing systems , vol. 33, pp. 9459–9474, 2020

2020
[40]

Self-rag: Learn- ing to retrieve, generate, and critique through self-reﬂection,

A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learn- ing to retrieve, generate, and critique through self-reﬂection,” in Inter- national conference on learning representations , vol. 2024, 2024, pp. 9112–9141

2024
[41]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[42]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Y ao, J. Zhao, D. Y u, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[43]

Self-reﬁne: Iter- ative reﬁnement with self-feedback,

A. Madaan, N. Tandon, P . Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Y ang et al. , “Self-reﬁne: Iter- ative reﬁnement with self-feedback,” Advances in neural information processing systems , vol. 36, pp. 46 534–46 594, 2023

2023
[44]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu et al. , “Autogen: Enabling next-gen llm applications via multi-agent conversation,” arXiv preprint arXiv:2308.08155 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[45]

Ask-eda: A design assistant empowered by llm, hybrid rag and abbreviation de-hallucination,

L. Shi, M. Kazda, B. Sears, N. Shropshire, and R. Puri, “Ask-eda: A design assistant empowered by llm, hybrid rag and abbreviation de-hallucination,” in 2024 IEEE LLM Aided Design Workshop (LAD) . IEEE, 2024, pp. 1–5

2024
[46]

Orassistant: A cus- tom rag-based conversational assistant for openroad,

A. Kaintura, S. S. Luar, I. I. Almeida et al. , “Orassistant: A cus- tom rag-based conversational assistant for openroad,” arXiv preprint arXiv:2410.03845, 2024. Zhe Zhao is currently pursuing the M.E. degree in integrated circuits and systems with the Shenzhen International Graduate School, Tsinghua Univer- sity, Shenzhen, China. His research interests i...

work page arXiv 2024

[1] [1]

A survey and evaluation of fpga high-level synthesis tools,

R. Nane, V .-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y . T. Chen, H. Hsiao, S. Brown, F. Ferrandi et al. , “A survey and evaluation of fpga high-level synthesis tools,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 35, no. 10, pp. 1591– 1604, 2015

2015

[2] [2]

Are we there yet? a study on the state of high-level synthesis,

S. Lahti, P . Sjövall, J. V anne, and T. D. Hämäläinen, “Are we there yet? a study on the state of high-level synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 38, no. 5, pp. 898–911, 2018

2018

[3] [3]

Fpga hls today: successes, challenges, and opportunities,

J. Cong, J. Lau, G. Liu, S. Neuendorffer, P . Pan, K. Vissers, and Z. Zhang, “Fpga hls today: successes, challenges, and opportunities,” ACM Transactions on Reconﬁgurable Technology and Systems (TRETS) , vol. 15, no. 4, pp. 1–42, 2022

2022

[4] [4]

StarCoder: may the source be with you!

R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim et al. , “Starcoder: may the source be with you!” arXiv preprint arXiv:2305.06161 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

V erigen: A large language model for verilog code generation,

S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “V erigen: A large language model for verilog code generation,” ACM Transactions on Design Automation of Electronic Systems , vol. 29, no. 3, pp. 1–31, 2024

2024

[6] [6]

V erilogeval: Evaluating large language models for verilog code generation,

M. Liu, N. Pinckney, B. Khailany, and H. Ren, “V erilogeval: Evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) . IEEE, 2023, pp. 1–8

2023

[7] [7]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in 2024 29th Asia and South Paciﬁc Design Automation Conference (ASP-DAC) . IEEE, 2024, pp. 722–727

2024

[8] [8]

Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,

S. Liu, W. Fang, Y . Lu, Q. Zhang, H. Zhang, and Z. Xie, “Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,” in 2024 IEEE LLM Aided Design Workshop (LAD). IEEE, 2024, pp. 1–5

2024

[9] [9]

Rtlﬁxer: Automatically ﬁxing rtl syntax errors with large language model,

Y . Tsai, M. Liu, and H. Ren, “Rtlﬁxer: Automatically ﬁxing rtl syntax errors with large language model,” in Proceedings of the 61st ACM/IEEE Design Automation Conference , 2024, pp. 1–6

2024

[10] [10]

C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,

L. Collini, S. Garg, and R. Karri, “C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,” ACM Transac- tions on Design Automation of Electronic Systems , vol. 30, no. 6, pp. 1–24, 2025

2025

[11] [11]

Hlstrans: Dataset for c-to-hls hardware code synthesis,

Q. Zou, N. Chen, Y . Chen, B. He, and W. Wong, “Hlstrans: Dataset for c-to-hls hardware code synthesis,” arXiv preprint arXiv:2507.04315 , 2025

work page arXiv 2025

[12] [12]

ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis

R. Li, J. Xiong, X. He, J. Zhao, J. Lv, H. Fang, L. Qi, and X. Wang, “Chathls: Towards systematic design automation and optimization for high-level synthesis,” arXiv preprint arXiv:2507.00642 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[13] [13]

Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,

S. Abi-Karam and C. Hao, “Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,” in 2025 IEEE International Conference on LLM-Aided Design (ICLAD) . IEEE, 2025, pp. 219–226

2025

[14] [14]

High-level synthesis design space explo- ration: Past, present, and future,

B. C. Schafer and Z. Wang, “High-level synthesis design space explo- ration: Past, present, and future,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 39, no. 10, pp. 2628– 2639, 2019

2019

[15] [15]

Bambu: an open-source research framework for the high-level synthesis of complex applica- tions,

F. Ferrandi, V . G. Castellana, S. Curzel, P . Fezzardi, M. Fiorito, M. Lat- tuada, M. Minutoli, C. Pilato, and A. Tumeo, “Bambu: an open-source research framework for the high-level synthesis of complex applica- tions,” in 2021 58th ACM/IEEE Design Automation Conference (DAC) . IEEE, 2021, pp. 1327–1330

2021

[16] [16]

From c/c++ code to high- performance dataﬂow circuits,

L. Josipovi ´c, A. Guerrieri, and P . Ienne, “From c/c++ code to high- performance dataﬂow circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 41, no. 7, pp. 2142–2155, 2021. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRA TED CIRCUITS AND SYSTEMS, VOL. XX, NO. X, MONTH YEAR 10

2021

[17] [17]

Autodse: Enabling software programmers to design efﬁcient fpga accelerators,

A. Sohrabizadeh, C. H. Y u, M. Gao, and J. Cong, “Autodse: Enabling software programmers to design efﬁcient fpga accelerators,” ACM Trans- actions on Design Automation of Electronic Systems (TODAES) , vol. 27, no. 4, pp. 1–27, 2022

2022

[18] [18]

Chstone: A benchmark program suite for practical c-based high-level synthesis,

Y . Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii, “Chstone: A benchmark program suite for practical c-based high-level synthesis,” in 2008 IEEE International Symposium on Circuits and Systems (ISCAS) . IEEE, 2008, pp. 1192–1195

2008

[19] [19]

Mach- suite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Mach- suite: Benchmarks for accelerator design and customized architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014, pp. 110–119

2014

[20] [20]

Rosetta: A realistic high-level synthesis benchmark suite for software programmable fpgas,

Y . Zhou, U. Gupta, S. Dai, R. Zhao, N. Srivastava, H. Jin, J. Feath- erston, Y .-H. Lai, G. Liu, G. A. V elasquez et al. , “Rosetta: A realistic high-level synthesis benchmark suite for software programmable fpgas,” in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays , 2018, pp. 269–278

2018

[21] [21]

To- wards a comprehensive benchmark for high-level synthesis targeted to fpgas,

Y . Bai, A. Sohrabizadeh, Z. Qin, Z. Hu, Y . Sun, and J. Cong, “To- wards a comprehensive benchmark for high-level synthesis targeted to fpgas,” Advances in Neural Information Processing Systems , vol. 36, pp. 45 288–45 299, 2023

2023

[22] [22]

Hlspilot: Llm-based high-level syn- thesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level syn- thesis,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , 2024, pp. 1–9

2024

[23] [23]

Sage-hls: Syntax-aware ast-guided llm for high-level synthesis code generation,

M. Z. S. Khan, N. Mashnoor, M. Akyash, K. Azar, and H. Kamali, “Sage-hls: Syntax-aware ast-guided llm for high-level synthesis code generation,” in 2025 IEEE 43rd International Conference on Computer Design (ICCD) . IEEE, 2025, pp. 574–581

2025

[24] [24]

Synthai: A multi agent generative ai framework for automated modular hls design generation,

S. A. Sheikholeslam and A. Ivanov, “Synthai: A multi agent generative ai framework for automated modular hls design generation,” arXiv preprint arXiv:2405.16072, 2024

work page arXiv 2024

[25] [25]

Automated c/c++ program repair for high-level synthesis via large lan- guage models,

K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, “Automated c/c++ program repair for high-level synthesis via large lan- guage models,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD , 2024, pp. 1–9

2024

[26] [26]

Hlsrewriter: Efﬁcient refactoring and optimization of c/c++ code with llms for high-level synthesis,

K. Xu, G. L. Zhang, X. Yin, C. Zhuo, U. Schlichtmann, and B. Li, “Hlsrewriter: Efﬁcient refactoring and optimization of c/c++ code with llms for high-level synthesis,” ACM Transactions on Design Automation of Electronic Systems , vol. 31, no. 4, pp. 1–21, 2026

2026

[27] [27]

Hlsdebugger: Identiﬁcation and cor- rection of logic bugs in hls code with llm solutions,

J. Wang, S. Liu, Y . Lu, and Z. Xie, “Hlsdebugger: Identiﬁcation and cor- rection of logic bugs in hls code with llm solutions,” in 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD) . IEEE, 2025, pp. 1–9

2025

[28] [28]

Correcthdl: Agentic hdl design with llms leveraging high-level synthesis as reference,

K. Xu, G. L. Zhang, U. Schlichtmann, and B. Li, “Correcthdl: Agentic hdl design with llms leveraging high-level synthesis as reference,” arXiv preprint arXiv:2511.16395 , 2025

work page arXiv 2025

[29] [29]

Automatic software repair: A bibliography,

M. Monperrus, “Automatic software repair: A bibliography,” ACM Com- puting Surveys (CSUR) , vol. 51, no. 1, pp. 1–24, 2018

2018

[30] [30]

Automated program repair in the era of large pre-trained language models,

C. S. Xia, Y . Wei, and L. Zhang, “Automated program repair in the era of large pre-trained language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . IEEE, 2023, pp. 1482–1494

2023

[31] [31]

Repair is nearly generation: Multilingual program repair with llms,

H. Joshi, J. C. Sanchez, S. Gulwani, V . Le, G. V erbruggen, and I. Radi ˇcek, “Repair is nearly generation: Multilingual program repair with llms,” in Proceedings of the AAAI Conference on Artiﬁcial Intelli- gence, vol. 37, no. 4, 2023, pp. 5131–5140

2023

[32] [32]

Repairagent: An autonomous, llm-based agent for program repair,

I. Bouzenia, P . Devanbu, and M. Pradel, “Repairagent: An autonomous, llm-based agent for program repair,” in 2025 IEEE/ACM 47th Interna- tional Conference on Software Engineering (ICSE) . IEEE, 2025, pp. 2188–2200

2025

[33] [33]

VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation

H. Ping, A. Bhattacharjee, P . Zhang, S. Li, W. Y ang, A. Cheng, X. Zhang, J. Thomason, A. Jannesari, N. Ahmed et al. , “V erimoa: A mixture-of-agents framework for spec-to-hdl generation,” arXiv preprint arXiv:2510.27617, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Deepv: a model-agnostic retrieval- augmented framework for verilog code generation with a high-quality knowledge base,

Z. Ibnat, P . E. Calzada, R. M. Ihtemam, S. K. Saha, J. Zhou, F. Farahmandi, and M. Tehranipoor, “Deepv: a model-agnostic retrieval- augmented framework for verilog code generation with a high-quality knowledge base,” arXiv preprint arXiv:2510.05327 , 2025

work page arXiv 2025

[35] [35]

Lift: Llm-based pragma insertion for hls via gnn supervised ﬁne-tuning,

N. Prakriya, Z. Ding, Y . Sun, and J. Cong, “Lift: Llm-based pragma insertion for hls via gnn supervised ﬁne-tuning,” arXiv preprint arXiv:2504.21187, 2025

work page arXiv 2025

[36] [36]

Timelyhls: Llm- based timing-aware and architecture-speciﬁc fpga hls optimization,

N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Timelyhls: Llm- based timing-aware and architecture-speciﬁc fpga hls optimization,” in 2025 IEEE International Conference on Omni-layer Intelligent Systems (COINS). IEEE, 2025, pp. 1–6

2025

[37] [37]

Agenttts: Large language model agent for test- time compute-optimal scaling strategy in complex tasks,

F. Wang, H. Liu, Z. Dai, J. Zeng, Z. Zhang, Z. Wu, C. Luo, Z. Li, X. Tang, Q. He et al. , “Agenttts: Large language model agent for test- time compute-optimal scaling strategy in complex tasks,” Advances in Neural Information Processing Systems , vol. 38, pp. 98 396–98 433, 2026

2026

[38] [38]

Can reasoning models reason about hardware? an agentic hls perspective,

L. Collini, A. Hennessee, R. Karri, and S. Garg, “Can reasoning models reason about hardware? an agentic hls perspective,” in 2025 IEEE In- ternational Conference on LLM-Aided Design (ICLAD) . IEEE, 2025, pp. 188–194

2025

[39] [39]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küt- tler, M. Lewis, W.-t. Yih, T. Rocktäschel et al. , “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in neural in- formation processing systems , vol. 33, pp. 9459–9474, 2020

2020

[40] [40]

Self-rag: Learn- ing to retrieve, generate, and critique through self-reﬂection,

A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learn- ing to retrieve, generate, and critique through self-reﬂection,” in Inter- national conference on learning representations , vol. 2024, 2024, pp. 9112–9141

2024

[41] [41]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[42] [42]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Y ao, J. Zhao, D. Y u, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” arXiv preprint arXiv:2210.03629 , 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[43] [43]

Self-reﬁne: Iter- ative reﬁnement with self-feedback,

A. Madaan, N. Tandon, P . Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y . Y ang et al. , “Self-reﬁne: Iter- ative reﬁnement with self-feedback,” Advances in neural information processing systems , vol. 36, pp. 46 534–46 594, 2023

2023

[44] [44]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu et al. , “Autogen: Enabling next-gen llm applications via multi-agent conversation,” arXiv preprint arXiv:2308.08155 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[45] [45]

Ask-eda: A design assistant empowered by llm, hybrid rag and abbreviation de-hallucination,

L. Shi, M. Kazda, B. Sears, N. Shropshire, and R. Puri, “Ask-eda: A design assistant empowered by llm, hybrid rag and abbreviation de-hallucination,” in 2024 IEEE LLM Aided Design Workshop (LAD) . IEEE, 2024, pp. 1–5

2024

[46] [46]

Orassistant: A cus- tom rag-based conversational assistant for openroad,

A. Kaintura, S. S. Luar, I. I. Almeida et al. , “Orassistant: A cus- tom rag-based conversational assistant for openroad,” arXiv preprint arXiv:2410.03845, 2024. Zhe Zhao is currently pursuing the M.E. degree in integrated circuits and systems with the Shenzhen International Graduate School, Tsinghua Univer- sity, Shenzhen, China. His research interests i...

work page arXiv 2024