Shift-Left High-Level Synthesis Verification via Knowledge-Augmented LLM Agent

Hongbing Lang; Luke Ztz Hu; Songping Mai; Zhe Zhao; Zhihan Xiao

arxiv: 2606.17128 · v3 · pith:7GS7GRPVnew · submitted 2026-06-15 · 💻 cs.AR

Shift-Left High-Level Synthesis Verification via Knowledge-Augmented LLM Agent

Zhihan Xiao , Hongbing Lang , Zhe Zhao , Luke Ztz Hu , Songping Mai This is my paper

Pith reviewed 2026-06-27 02:45 UTC · model grok-4.3

classification 💻 cs.AR

keywords High-Level SynthesisFunctional VerificationLLM AgentsKnowledge GraphsTestbench GenerationEquivalence CheckingShift-Left Verification

0 comments

The pith

A knowledge-augmented LLM agent with dual-tier checks verifies functional consistency between original C and HLS-C code before synthesis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a shift-left verification framework that augments an LLM agent with a heterogeneous knowledge graph and dual-tier consistency checking to automate equivalence verification between original C specifications and HLS-C implementations. The approach combines static structural alignment with dynamic behavioral equivalence, symbolic execution, and coverage-driven refinement to counter the stochastic limitations of plain LLMs in testbench generation. A sympathetic reader would care because manual functional consistency checking remains labor-intensive in HLS design flows, and reliable pre-synthesis automation could reduce errors that surface only after costly synthesis runs.

Core claim

The framework introduces a Dual-Tier Consistency Checking mechanism that jointly enforces static structural alignment and dynamic behavioral equivalence between paired testbenches, while integrating symbolic execution and coverage-driven refinement to improve verification completeness. Furthermore, we construct a heterogeneous HLS Verification Knowledge Graph to provide topology-aware reasoning priors for testbench generation, and design an autonomous verification agent to orchestrate iterative refinement and failure diagnosis across heterogeneous toolchains. Experimental results on 107 HLS benchmark pairs demonstrate that the proposed framework achieves 0.9826 average coverage and 0.9533 dy

What carries the argument

Dual-Tier Consistency Checking mechanism that jointly enforces static structural alignment and dynamic behavioral equivalence, supported by a heterogeneous HLS Verification Knowledge Graph for topology-aware reasoning priors.

If this is right

Verification moves before synthesis, enabling earlier detection of functional mismatches in the HLS flow.
The autonomous agent performs iterative refinement and cross-toolchain failure diagnosis without manual intervention.
Coverage-driven refinement combined with symbolic execution raises average coverage above 0.98 on benchmark pairs.
Dynamic consistency reaches 0.95 while outperforming AST-based, retrieval-augmented, and plain iterative agent baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-tier structure and knowledge-graph priors could be adapted to verify other source-to-source transformations such as compiler passes or domain-specific language translations.
Extending the knowledge graph with additional hardware-specific equivalence rules might further reduce false consistency signals on edge-case arithmetic or pointer behaviors.
Integration into commercial HLS tools could shorten the manual testbench creation loop that currently dominates early design effort.

Load-bearing premise

The Dual-Tier Consistency Checking mechanism together with the heterogeneous HLS Verification Knowledge Graph supplies unbiased, sufficiently complete priors that do not systematically miss equivalence failures or produce false consistency signals.

What would settle it

A pair of original C and HLS-C implementations that differ in observable behavior yet receive high structural alignment and dynamic consistency scores from the framework.

Figures

Figures reproduced from arXiv: 2606.17128 by Hongbing Lang, Luke Ztz Hu, Songping Mai, Zhe Zhao, Zhihan Xiao.

**Figure 2.** Figure 2: Overview of the proposed KG-augmented, agent-based verification framework for functional consistency checking between original C and HLS-C [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed KG-augmented verification framework. During inference, graph-retrieved reasoning priors guide coverage-driven stimulus [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

High-Level Synthesis (HLS) relies on transforming original C specifications into synthesizable HLS-oriented C (HLS-C) implementations. Functional consistency verification between original C specifications and HLS-C implementations is a critical yet labor-intensive task in HLS design flows. While Large Language Models (LLMs) have recently shown promise in automated testbench generation, their stochastic nature often leads to insufficient coverage, inconsistent verification environments, and unreliable equivalence checking results. To address these limitations, we propose a knowledge-augmented, agent-driven shift-left verification framework for automated functional consistency checking between original C and HLS-C implementations before synthesis. The framework introduces a Dual-Tier Consistency Checking mechanism that jointly enforces static structural alignment and dynamic behavioral equivalence between paired testbenches, while integrating symbolic execution and coverage-driven refinement to improve verification completeness. Furthermore, we construct a heterogeneous HLS Verification Knowledge Graph to provide topology-aware reasoning priors for testbench generation, and design an autonomous verification agent to orchestrate iterative refinement and failure diagnosis across heterogeneous toolchains. Experimental results on 107 HLS benchmark pairs demonstrate that the proposed framework achieves 0.9826 average coverage and 0.9533 dynamic consistency, outperforming representative AST-based, retrieval-augmented, and iterative agent-based baselines. https://github.com/cz-5f/HLS-LeVeri.git

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies an LLM agent plus knowledge graph and dual-tier checks to pre-synthesis C/HLS-C equivalence but the reported gains rest on untested assumptions about what the checks actually catch.

read the letter

The paper's key move is wrapping an LLM agent around a heterogeneous HLS verification knowledge graph and a dual-tier consistency checker to automate functional equivalence between original C specs and HLS-C implementations before synthesis. It reports strong numbers: 0.9826 average coverage and 0.9533 dynamic consistency across 107 benchmark pairs, beating AST-based, retrieval-augmented, and iterative agent baselines.

What is actually new is the concrete integration of topology-aware graph priors with static-dynamic checking and symbolic execution inside the agent loop for this specific HLS verification task. The github release adds value for anyone who wants to inspect or extend the implementation.

The setup is reasonable on paper, but the experimental section leaves some gaps. There is no detail on benchmark selection, baseline re-implementations, or an independent set of known non-equivalent pairs to confirm the checker actually flags failures rather than missing them. If the knowledge graph or the dual-tier mechanism under-represents data-dependent control or floating-point cases, the high consistency scores could come from the testbenches not probing those areas. That matches the stress-test concern.

This is for people working on verification automation in high-level synthesis or applying agents to EDA flows. A reader looking for practical examples of knowledge-augmented agents in hardware design would find the framework useful once the controls are clarified. It has enough novelty in the application and a code artifact to justify sending it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper proposes a knowledge-augmented LLM agent framework for shift-left functional consistency verification between original C specifications and HLS-C implementations. It introduces a Dual-Tier Consistency Checking mechanism (static structural alignment plus dynamic behavioral equivalence via symbolic execution and coverage refinement), a heterogeneous HLS Verification Knowledge Graph for topology-aware priors, and an autonomous verification agent for iterative refinement and diagnosis. On 107 HLS benchmark pairs the framework reports 0.9826 average coverage and 0.9533 dynamic consistency, outperforming AST-based, retrieval-augmented, and iterative agent baselines. The GitHub repository is provided for reproducibility.

Significance. If the empirical results hold under rigorous validation, the work could meaningfully reduce manual verification effort in HLS design flows by combining LLM agents with structured knowledge and dual-tier checks. The open-source code release is a clear strength that supports reproducibility and future extensions.

major comments (2)

[Abstract / §4] Abstract / §4 (Experimental Results): The headline performance numbers (0.9826 coverage, 0.9533 consistency on 107 pairs) are presented without any description of benchmark selection criteria, re-implementation details for the three baseline categories, statistical significance testing, or analysis of failure cases. These omissions make it impossible to determine whether the reported superiority is load-bearing for the central claim.
[§3.2 / §3.3] §3.2 (Dual-Tier Consistency Checking) and §3.3 (Knowledge Graph): The paper supplies no independent validation set of known non-equivalent C/HLS-C pairs. Without such a test, it remains unclear whether the static/dynamic checks and graph priors systematically miss equivalence failures (e.g., data-dependent control flow or synthesis-induced transformations) or produce false consistency signals, directly affecting the reliability of the 0.9533 dynamic consistency metric.

minor comments (1)

[Abstract] The abstract states the GitHub link but the manuscript does not indicate which artifacts (benchmarks, scripts, or trained models) are actually released.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate the requested details and validation experiments.

read point-by-point responses

Referee: [Abstract / §4] Abstract / §4 (Experimental Results): The headline performance numbers (0.9826 coverage, 0.9533 consistency on 107 pairs) are presented without any description of benchmark selection criteria, re-implementation details for the three baseline categories, statistical significance testing, or analysis of failure cases. These omissions make it impossible to determine whether the reported superiority is load-bearing for the central claim.

Authors: We agree that these details are necessary to substantiate the empirical claims. In the revised manuscript we will expand §4 to include: (1) explicit benchmark selection criteria, describing how the 107 pairs were drawn from standard HLS suites (CHStone, PolyBench, and representative industrial designs) with diversity requirements on control-flow complexity and data dependencies; (2) re-implementation details for the AST-based, RAG, and iterative-agent baselines, including the concrete algorithms, retrieval configurations, and prompt templates employed; (3) statistical significance testing (paired t-tests and bootstrap confidence intervals) on the reported differences; and (4) a failure-case analysis that categorizes the small number of pairs where full consistency was not reached and discusses root causes. These additions will make the superiority claims verifiable. revision: yes
Referee: [§3.2 / §3.3] §3.2 (Dual-Tier Consistency Checking) and §3.3 (Knowledge Graph): The paper supplies no independent validation set of known non-equivalent C/HLS-C pairs. Without such a test, it remains unclear whether the static/dynamic checks and graph priors systematically miss equivalence failures (e.g., data-dependent control flow or synthesis-induced transformations) or produce false consistency signals, directly affecting the reliability of the 0.9533 dynamic consistency metric.

Authors: This is a fair and important observation. Our current evaluation uses pairs that are intended to be functionally equivalent after HLS transformation. To directly address the concern, we will add a controlled validation experiment in the revised §4 (or an appendix). We will create a set of known non-equivalent pairs by injecting targeted discrepancies—such as altered loop bounds, modified conditional logic, or synthesis-incompatible constructs—into a subset of the benchmarks. We will then measure the framework’s detection rate for these inconsistencies, reporting both true-positive detection and any false-consistency signals. This will provide quantitative evidence on the sensitivity of the dual-tier checks and the 0.9533 metric. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical evaluation on external benchmarks

full rationale

The paper proposes an LLM-agent framework with a Dual-Tier Consistency Checking mechanism and a heterogeneous HLS Verification Knowledge Graph, then reports empirical measurements (0.9826 average coverage, 0.9533 dynamic consistency) on 107 external HLS benchmark pairs. No derivation chain, fitted parameters, or self-citation load-bearing steps exist; the headline numbers are direct experimental outcomes rather than quantities forced by construction from the framework's own priors or inputs. The evaluation uses independent benchmarks and compares against external baselines, keeping the result self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.1-grok · 5773 in / 1016 out tokens · 43903 ms · 2026-06-27T02:45:18.764499+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages

[1]

A review on ai acceleration with fpga and asics: A comparative study,

M. Kotari, S. Kanabargi, S. Aloriet al., “A review on ai acceleration with fpga and asics: A comparative study,” in2025 IEEE 4th International Conference on Data, Decision and Systems (ICDDS). IEEE, 2025, pp. 212–215

2025
[2]

A survey on convolutional neural network accelerators: Gpu, fpga and asic,

Y . Hu, Y . Liu, and Z. Liu, “A survey on convolutional neural network accelerators: Gpu, fpga and asic,” in2022 14th International Conference on Computer Research and Development (ICCRD). IEEE, 2022, pp. 100–107

2022
[3]

Enhancing asic hls design optimization via knowledge transfer from fpga hls models,

S. Zheng, Z. Ding, C. Wang, B. Yu, M. Wong, Y . Sun, and J. Cong, “Enhancing asic hls design optimization via knowledge transfer from fpga hls models,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2026

2026
[4]

Hls-based optimization and design space explo- ration for applications with variable loop bounds,

Y .-k. Choi and J. Cong, “Hls-based optimization and design space explo- ration for applications with variable loop bounds,” in2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1–8

2018
[5]

Universal verification methodology (uvm) 1.2 user’s guide,

A. Initiative, “Universal verification methodology (uvm) 1.2 user’s guide,”Accellera Systems Initiative: Elk Grove, CA, USA, 2015

2015
[6]

High-level synthesis: Past, present, and future,

G. Martin and G. Smith, “High-level synthesis: Past, present, and future,” IEEE Design & Test of Computers, vol. 26, no. 4, pp. 18–25, 2009

2009
[7]

Legup: high-level synthesis for fpga-based processor/accelerator systems,

A. Canis, J. Choi, M. Aldham, V . Zhang, A. Kammoona, J. H. An- derson, S. Brown, and T. Czajkowski, “Legup: high-level synthesis for fpga-based processor/accelerator systems,” inProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, 2011, pp. 33–36

2011
[8]

Fpga- based solution for on-board verification of hardware modules using hls,

J. Caba, F. Rinc ´on, J. Barba, J. A. de la Torre, and J. C. L ´opez, “Fpga- based solution for on-board verification of hardware modules using hls,” Electronics, vol. 9, no. 12, p. 2024, 2020

2024
[9]

A shift-left approach in qualifi- cation of digital ips for socs by applying advanced automation and data analytics,

H. J. Chakraborty and D. K. Acharya, “A shift-left approach in qualifi- cation of digital ips for socs by applying advanced automation and data analytics,” in2024 28th International Symposium on VLSI Design and Test (VDAT). IEEE, 2024, pp. 1–5. 10

2024
[10]

Early development of uvm based verification environment of image signal processing designs using tlm reference model of rtl,

A. Jain, D. H. Gupta, S. Jana, and K. Kumar, “Early development of uvm based verification environment of image signal processing designs using tlm reference model of rtl,”arXiv preprint arXiv:1408.1150, 2014

Pith/arXiv arXiv 2014
[11]

Hlstrans: Dataset for c-to-hls hardware code synthesis,

Q. Zou, N. Chen, Y . Chen, B. He, and W. Wong, “Hlstrans: Dataset for c-to-hls hardware code synthesis,” 2025. [Online]. Available: https://arxiv.org/abs/2507.04315

arXiv 2025
[12]

Hlstester: Efficient testing of behavioral discrepancies with llms for high-level synthesis,

K. Xu, B. Li, G. L. Zhang, and U. Schlichtmann, “Hlstester: Efficient testing of behavioral discrepancies with llms for high-level synthesis,” in 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9

2025
[13]

Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,

S. Abi-Karam and C. Hao, “Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, Jun. 2025, p. 219–226. [Online]. Available: http://dx.doi.org/10.1109/ ICLAD65226.2025.00021

arXiv 2025
[14]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024, pp. 722–727

2024
[15]

Llms will always hallucinate, and we need to live with this,

S. Banerjee, A. Agarwal, and S. Singla, “Llms will always hallucinate, and we need to live with this,” inIntelligent Systems Conference. Springer, 2025, pp. 624–648

2025
[16]

Formal verification of high-level synthesis,

Y . Herklotz, J. D. Pollard, N. Ramanathan, and J. Wickerson, “Formal verification of high-level synthesis,”Proceedings of the ACM on Pro- gramming Languages, vol. 5, no. OOPSLA, pp. 1–30, 2021

2021
[17]

Verification of scheduling of conditional behaviors in high-level synthesis,

R. Chouksey and C. Karfa, “Verification of scheduling of conditional behaviors in high-level synthesis,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 7, pp. 1638–1651, 2020

2020
[18]

Hlspilot: Llm-based high-level synthesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,”arXiv preprint arXiv:2408.06810, 2024

arXiv 2024
[19]

Reliable graph-rag for codebases: Ast-derived graphs vs llm-extracted knowledge graphs,

M. R. Chinthareddy, “Reliable graph-rag for codebases: Ast-derived graphs vs llm-extracted knowledge graphs,” 2026. [Online]. Available: https://arxiv.org/abs/2601.08773

arXiv 2026
[20]

Knowledge graph based repository-level code generation,

M. Athale and V . Vaddina, “Knowledge graph based repository-level code generation,” in2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), 2025, pp. 169–176

2025
[21]

Synthai: A multi agent generative ai framework for automated modular hls design generation,

S. A. Sheikholeslam and A. Ivanov, “Synthai: A multi agent generative ai framework for automated modular hls design generation,” 2024. [Online]. Available: https://arxiv.org/abs/2405.16072

arXiv 2024
[22]

Confibench: Automatic testbench generation with confidence-based scenario mask and testbench ensemble using llms for hdl design,

R. Qiu, G. L. Zhang, R. Drechsler, T. Ho, U. Schlichtmann, and B. Li, “Confibench: Automatic testbench generation with confidence-based scenario mask and testbench ensemble using llms for hdl design,”ACM Trans. Des. Autom. Electron. Syst., Oct. 2025, just Accepted. [Online]. Available: https://doi.org/10.1145/3773087

work page doi:10.1145/3773087 2025
[23]

Klee: unassisted and automatic generation of high-coverage tests for complex systems programs

C. Cadar, D. Dunbar, D. R. Engleret al., “Klee: unassisted and automatic generation of high-coverage tests for complex systems programs.” in OSDI, vol. 8, 2008, pp. 209–224

2008
[24]

A formal verification method of scheduling in high-level synthesis,

C. Karfa, C. Mandal, D. Sarkar, S. R. Pentakota, and C. Reade, “A formal verification method of scheduling in high-level synthesis,” in 7th International Symposium on Quality Electronic Design (ISQED’06). IEEE, 2006, pp. 6–pp

2006
[25]

A study about the effi- ciency of formal high-level synthesis applied to verification,

J. M. Mendias, R. Hermida, and O. Pe ˜nalba, “A study about the effi- ciency of formal high-level synthesis applied to verification,”Integration, vol. 31, no. 2, pp. 101–131, 2002

2002
[26]

C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,

L. Collini, S. Garg, and R. Karri, “C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,” ACM Transactions on Design Automation of Electronic Systems, vol. 30, no. 6, p. 1–24, Oct. 2025. [Online]. Available: http: //dx.doi.org/10.1145/3734524

work page doi:10.1145/3734524 2025
[27]

Hlsdebugger: Identification and cor- rection of logic bugs in hls code with llm solutions,

J. Wang, S. Liu, Y . Lu, and Z. Xie, “Hlsdebugger: Identification and cor- rection of logic bugs in hls code with llm solutions,” in2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9

2025
[28]

Software unit test coverage and adequacy,

H. Zhu, P. A. Hall, and J. H. May, “Software unit test coverage and adequacy,”Acm computing surveys (csur), vol. 29, no. 4, pp. 366–427, 1997

1997
[29]

Llm4dv: Using large language models for hardware test stimuli generation,

Z. Zhang, B. Szekely, P. Gimenes, G. Chadwick, H. McNally, J. Cheng, R. Mullins, and Y . Zhao, “Llm4dv: Using large language models for hardware test stimuli generation,” in2025 IEEE 33rd Annual Interna- tional Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2025, pp. 133–137

2025
[30]

Satisfiability modulo theories,

C. Barrett and C. Tinelli, “Satisfiability modulo theories,” inHandbook of model checking. Springer, 2018, pp. 305–343

2018
[31]

Cute: A concolic unit testing engine for c,

K. Sen, D. Marinov, and G. Agha, “Cute: A concolic unit testing engine for c,”ACM SIGSOFT software engineering notes, vol. 30, no. 5, pp. 263–272, 2005

2005
[32]

Arks: Active retrieval in knowledge soup for code generation,

H. Su, S. Jiang, Y . Lai, H. Wu, B. Shi, C. Liu, Q. Liu, and T. Yu, “Arks: Active retrieval in knowledge soup for code generation,”arXiv preprint arXiv:2402.12317, 2024

arXiv 2024
[33]

Context-augmented code generation using pro- gramming knowledge graphs,

I. Saberi and F. Fard, “Context-augmented code generation using pro- gramming knowledge graphs,”arXiv preprint arXiv:2410.18251, 2024

arXiv 2024
[34]

Knowledge graph-guided retrieval augmented generation,

X. Zhu, Y . Xie, Y . Liu, Y . Li, and W. Hu, “Knowledge graph-guided retrieval augmented generation,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 8912–8924

2025
[35]

cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree,

Y . Zhang, X. Zhao, Z. Z. Wang, C. Yang, J. Wei, and T. Wu, “cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree,”arXiv preprint arXiv:2506.15655, 2025

arXiv 2025
[36]

An iteratively-refined dataset for high-level synthesis functional verification through llm-aided bug injection,

L. J. Wan, H. Ye, J. Wang, M. Jha, and D. Chen, “An iteratively-refined dataset for high-level synthesis functional verification through llm-aided bug injection,”2024 IEEE LLM Aided Design Workshop (LAD), pp. 1–6, 2024

2024
[37]

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond,

S. Abi-Karam, R. Sarkar, A. Seigler, S. Lowe, Z. Wei, H. Chen, N. Rao, L. John, A. Arora, and C. Hao, “HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond,” in2024 ACM/IEEE 6th Symposium on Machine Learning for CAD (MLCAD). Salt Lake City (Snowbird), UT, USA: IEEE, Sep. 2024, pp. 1–9

2024
[38]

Forgebench: A machine learning benchmark suite and auto-generation framework for next-generation hls tools,

A. Wanna, H. Chen, and C. Hao, “Forgebench: A machine learning benchmark suite and auto-generation framework for next-generation hls tools,” 2025. [Online]. Available: https://arxiv.org/abs/2504.15185

arXiv 2025
[39]

Machsuite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Machsuite: Benchmarks for accelerator design and customized architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC), 2014, pp. 110–119

2014
[40]

Parallel programming for fpgas,

R. Kastner, J. Matai, and S. Neuendorffer, “Parallel programming for fpgas,” 2018. [Online]. Available: https://arxiv.org/abs/1805.03648

Pith/arXiv arXiv 2018
[41]

Hlsdataset: Open-source dataset for ml-assisted fpga design using high level synthesis,

Z. Wei, A. Arora, R. Li, and L. John, “Hlsdataset: Open-source dataset for ml-assisted fpga design using high level synthesis,” in2023 IEEE 34th International Conference on Application-specific Systems, Archi- tectures and Processors (ASAP), 2023, pp. 197–204

2023

[1] [1]

A review on ai acceleration with fpga and asics: A comparative study,

M. Kotari, S. Kanabargi, S. Aloriet al., “A review on ai acceleration with fpga and asics: A comparative study,” in2025 IEEE 4th International Conference on Data, Decision and Systems (ICDDS). IEEE, 2025, pp. 212–215

2025

[2] [2]

A survey on convolutional neural network accelerators: Gpu, fpga and asic,

Y . Hu, Y . Liu, and Z. Liu, “A survey on convolutional neural network accelerators: Gpu, fpga and asic,” in2022 14th International Conference on Computer Research and Development (ICCRD). IEEE, 2022, pp. 100–107

2022

[3] [3]

Enhancing asic hls design optimization via knowledge transfer from fpga hls models,

S. Zheng, Z. Ding, C. Wang, B. Yu, M. Wong, Y . Sun, and J. Cong, “Enhancing asic hls design optimization via knowledge transfer from fpga hls models,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2026

2026

[4] [4]

Hls-based optimization and design space explo- ration for applications with variable loop bounds,

Y .-k. Choi and J. Cong, “Hls-based optimization and design space explo- ration for applications with variable loop bounds,” in2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1–8

2018

[5] [5]

Universal verification methodology (uvm) 1.2 user’s guide,

A. Initiative, “Universal verification methodology (uvm) 1.2 user’s guide,”Accellera Systems Initiative: Elk Grove, CA, USA, 2015

2015

[6] [6]

High-level synthesis: Past, present, and future,

G. Martin and G. Smith, “High-level synthesis: Past, present, and future,” IEEE Design & Test of Computers, vol. 26, no. 4, pp. 18–25, 2009

2009

[7] [7]

Legup: high-level synthesis for fpga-based processor/accelerator systems,

A. Canis, J. Choi, M. Aldham, V . Zhang, A. Kammoona, J. H. An- derson, S. Brown, and T. Czajkowski, “Legup: high-level synthesis for fpga-based processor/accelerator systems,” inProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, 2011, pp. 33–36

2011

[8] [8]

Fpga- based solution for on-board verification of hardware modules using hls,

J. Caba, F. Rinc ´on, J. Barba, J. A. de la Torre, and J. C. L ´opez, “Fpga- based solution for on-board verification of hardware modules using hls,” Electronics, vol. 9, no. 12, p. 2024, 2020

2024

[9] [9]

A shift-left approach in qualifi- cation of digital ips for socs by applying advanced automation and data analytics,

H. J. Chakraborty and D. K. Acharya, “A shift-left approach in qualifi- cation of digital ips for socs by applying advanced automation and data analytics,” in2024 28th International Symposium on VLSI Design and Test (VDAT). IEEE, 2024, pp. 1–5. 10

2024

[10] [10]

Early development of uvm based verification environment of image signal processing designs using tlm reference model of rtl,

A. Jain, D. H. Gupta, S. Jana, and K. Kumar, “Early development of uvm based verification environment of image signal processing designs using tlm reference model of rtl,”arXiv preprint arXiv:1408.1150, 2014

Pith/arXiv arXiv 2014

[11] [11]

Hlstrans: Dataset for c-to-hls hardware code synthesis,

Q. Zou, N. Chen, Y . Chen, B. He, and W. Wong, “Hlstrans: Dataset for c-to-hls hardware code synthesis,” 2025. [Online]. Available: https://arxiv.org/abs/2507.04315

arXiv 2025

[12] [12]

Hlstester: Efficient testing of behavioral discrepancies with llms for high-level synthesis,

K. Xu, B. Li, G. L. Zhang, and U. Schlichtmann, “Hlstester: Efficient testing of behavioral discrepancies with llms for high-level synthesis,” in 2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9

2025

[13] [13]

Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,

S. Abi-Karam and C. Hao, “Hls-eval: A benchmark and framework for evaluating llms on high-level synthesis design tasks,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, Jun. 2025, p. 219–226. [Online]. Available: http://dx.doi.org/10.1109/ ICLAD65226.2025.00021

arXiv 2025

[14] [14]

Rtllm: An open-source benchmark for design rtl generation with large language model,

Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” in2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2024, pp. 722–727

2024

[15] [15]

Llms will always hallucinate, and we need to live with this,

S. Banerjee, A. Agarwal, and S. Singla, “Llms will always hallucinate, and we need to live with this,” inIntelligent Systems Conference. Springer, 2025, pp. 624–648

2025

[16] [16]

Formal verification of high-level synthesis,

Y . Herklotz, J. D. Pollard, N. Ramanathan, and J. Wickerson, “Formal verification of high-level synthesis,”Proceedings of the ACM on Pro- gramming Languages, vol. 5, no. OOPSLA, pp. 1–30, 2021

2021

[17] [17]

Verification of scheduling of conditional behaviors in high-level synthesis,

R. Chouksey and C. Karfa, “Verification of scheduling of conditional behaviors in high-level synthesis,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 7, pp. 1638–1651, 2020

2020

[18] [18]

Hlspilot: Llm-based high-level synthesis,

C. Xiong, C. Liu, H. Li, and X. Li, “Hlspilot: Llm-based high-level synthesis,”arXiv preprint arXiv:2408.06810, 2024

arXiv 2024

[19] [19]

Reliable graph-rag for codebases: Ast-derived graphs vs llm-extracted knowledge graphs,

M. R. Chinthareddy, “Reliable graph-rag for codebases: Ast-derived graphs vs llm-extracted knowledge graphs,” 2026. [Online]. Available: https://arxiv.org/abs/2601.08773

arXiv 2026

[20] [20]

Knowledge graph based repository-level code generation,

M. Athale and V . Vaddina, “Knowledge graph based repository-level code generation,” in2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), 2025, pp. 169–176

2025

[21] [21]

Synthai: A multi agent generative ai framework for automated modular hls design generation,

S. A. Sheikholeslam and A. Ivanov, “Synthai: A multi agent generative ai framework for automated modular hls design generation,” 2024. [Online]. Available: https://arxiv.org/abs/2405.16072

arXiv 2024

[22] [22]

Confibench: Automatic testbench generation with confidence-based scenario mask and testbench ensemble using llms for hdl design,

R. Qiu, G. L. Zhang, R. Drechsler, T. Ho, U. Schlichtmann, and B. Li, “Confibench: Automatic testbench generation with confidence-based scenario mask and testbench ensemble using llms for hdl design,”ACM Trans. Des. Autom. Electron. Syst., Oct. 2025, just Accepted. [Online]. Available: https://doi.org/10.1145/3773087

work page doi:10.1145/3773087 2025

[23] [23]

Klee: unassisted and automatic generation of high-coverage tests for complex systems programs

C. Cadar, D. Dunbar, D. R. Engleret al., “Klee: unassisted and automatic generation of high-coverage tests for complex systems programs.” in OSDI, vol. 8, 2008, pp. 209–224

2008

[24] [24]

A formal verification method of scheduling in high-level synthesis,

C. Karfa, C. Mandal, D. Sarkar, S. R. Pentakota, and C. Reade, “A formal verification method of scheduling in high-level synthesis,” in 7th International Symposium on Quality Electronic Design (ISQED’06). IEEE, 2006, pp. 6–pp

2006

[25] [25]

A study about the effi- ciency of formal high-level synthesis applied to verification,

J. M. Mendias, R. Hermida, and O. Pe ˜nalba, “A study about the effi- ciency of formal high-level synthesis applied to verification,”Integration, vol. 31, no. 2, pp. 101–131, 2002

2002

[26] [26]

C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,

L. Collini, S. Garg, and R. Karri, “C2hlsc: Leveraging large language models to bridge the software-to-hardware design gap,” ACM Transactions on Design Automation of Electronic Systems, vol. 30, no. 6, p. 1–24, Oct. 2025. [Online]. Available: http: //dx.doi.org/10.1145/3734524

work page doi:10.1145/3734524 2025

[27] [27]

Hlsdebugger: Identification and cor- rection of logic bugs in hls code with llm solutions,

J. Wang, S. Liu, Y . Lu, and Z. Xie, “Hlsdebugger: Identification and cor- rection of logic bugs in hls code with llm solutions,” in2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9

2025

[28] [28]

Software unit test coverage and adequacy,

H. Zhu, P. A. Hall, and J. H. May, “Software unit test coverage and adequacy,”Acm computing surveys (csur), vol. 29, no. 4, pp. 366–427, 1997

1997

[29] [29]

Llm4dv: Using large language models for hardware test stimuli generation,

Z. Zhang, B. Szekely, P. Gimenes, G. Chadwick, H. McNally, J. Cheng, R. Mullins, and Y . Zhao, “Llm4dv: Using large language models for hardware test stimuli generation,” in2025 IEEE 33rd Annual Interna- tional Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 2025, pp. 133–137

2025

[30] [30]

Satisfiability modulo theories,

C. Barrett and C. Tinelli, “Satisfiability modulo theories,” inHandbook of model checking. Springer, 2018, pp. 305–343

2018

[31] [31]

Cute: A concolic unit testing engine for c,

K. Sen, D. Marinov, and G. Agha, “Cute: A concolic unit testing engine for c,”ACM SIGSOFT software engineering notes, vol. 30, no. 5, pp. 263–272, 2005

2005

[32] [32]

Arks: Active retrieval in knowledge soup for code generation,

H. Su, S. Jiang, Y . Lai, H. Wu, B. Shi, C. Liu, Q. Liu, and T. Yu, “Arks: Active retrieval in knowledge soup for code generation,”arXiv preprint arXiv:2402.12317, 2024

arXiv 2024

[33] [33]

Context-augmented code generation using pro- gramming knowledge graphs,

I. Saberi and F. Fard, “Context-augmented code generation using pro- gramming knowledge graphs,”arXiv preprint arXiv:2410.18251, 2024

arXiv 2024

[34] [34]

Knowledge graph-guided retrieval augmented generation,

X. Zhu, Y . Xie, Y . Liu, Y . Li, and W. Hu, “Knowledge graph-guided retrieval augmented generation,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 8912–8924

2025

[35] [35]

cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree,

Y . Zhang, X. Zhao, Z. Z. Wang, C. Yang, J. Wei, and T. Wu, “cast: Enhancing code retrieval-augmented generation with structural chunking via abstract syntax tree,”arXiv preprint arXiv:2506.15655, 2025

arXiv 2025

[36] [36]

An iteratively-refined dataset for high-level synthesis functional verification through llm-aided bug injection,

L. J. Wan, H. Ye, J. Wang, M. Jha, and D. Chen, “An iteratively-refined dataset for high-level synthesis functional verification through llm-aided bug injection,”2024 IEEE LLM Aided Design Workshop (LAD), pp. 1–6, 2024

2024

[37] [37]

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond,

S. Abi-Karam, R. Sarkar, A. Seigler, S. Lowe, Z. Wei, H. Chen, N. Rao, L. John, A. Arora, and C. Hao, “HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond,” in2024 ACM/IEEE 6th Symposium on Machine Learning for CAD (MLCAD). Salt Lake City (Snowbird), UT, USA: IEEE, Sep. 2024, pp. 1–9

2024

[38] [38]

Forgebench: A machine learning benchmark suite and auto-generation framework for next-generation hls tools,

A. Wanna, H. Chen, and C. Hao, “Forgebench: A machine learning benchmark suite and auto-generation framework for next-generation hls tools,” 2025. [Online]. Available: https://arxiv.org/abs/2504.15185

arXiv 2025

[39] [39]

Machsuite: Benchmarks for accelerator design and customized architectures,

B. Reagen, R. Adolf, Y . S. Shao, G.-Y . Wei, and D. Brooks, “Machsuite: Benchmarks for accelerator design and customized architectures,” in 2014 IEEE International Symposium on Workload Characterization (IISWC), 2014, pp. 110–119

2014

[40] [40]

Parallel programming for fpgas,

R. Kastner, J. Matai, and S. Neuendorffer, “Parallel programming for fpgas,” 2018. [Online]. Available: https://arxiv.org/abs/1805.03648

Pith/arXiv arXiv 2018

[41] [41]

Hlsdataset: Open-source dataset for ml-assisted fpga design using high level synthesis,

Z. Wei, A. Arora, R. Li, and L. John, “Hlsdataset: Open-source dataset for ml-assisted fpga design using high level synthesis,” in2023 IEEE 34th International Conference on Application-specific Systems, Archi- tectures and Processors (ASAP), 2023, pp. 197–204

2023