pith. sign in

arxiv: 2504.19959 · v4 · submitted 2025-04-28 · 💻 cs.AR · cs.AI

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Pith reviewed 2026-05-22 17:53 UTC · model grok-4.3

classification 💻 cs.AR cs.AI
keywords UVMLLMRTL verificationtestbench generationcoverage-driven refinementautomated EDAIC design verification
0
0 comments X

The pith

UVM^2 uses LLMs to generate and iteratively refine UVM testbenches from RTL designs using coverage feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UVM^2, an automated framework that employs large language models to create Universal Verification Methodology testbenches and then refines them step by step based on coverage results from EDA tools. Verification consumes nearly 70 percent of IC development effort today, so replacing much of the manual coding and tool orchestration with this loop could free engineers from repetitive work while preserving structured, reusable testbenches. Tests on RTL designs up to 1.6K lines show substantial cuts in setup time and coverage rates that exceed earlier automated solutions.

Core claim

UVM^2 leverages LLMs to generate UVM testbenches for RTL designs and iteratively refines them using coverage feedback, reducing testbench setup time by up to UVM^2 compared to experienced engineers while achieving average code coverage of 87.44 percent and function coverage of 89.58 percent, outperforming state-of-the-art solutions by 20.96 percent and 23.51 percent respectively.

What carries the argument

The iterative refinement loop that feeds coverage metrics back to the LLM so it can produce progressively better and still-valid UVM testbench code and stimuli.

If this is right

  • Verification engineers spend far less time on manual coding and repeated EDA tool runs.
  • Average code and functional coverage exceed what prior automated methods reach.
  • The same coverage-driven loop can be applied to other RTL designs of similar scale.
  • Structured UVM testbenches remain reusable even when generated automatically.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could later incorporate bug-detection signals or formal properties as additional feedback.
  • Scaling to designs much larger than 1.6K lines would require testing how well the loop maintains validity.
  • Integration with existing EDA flows might allow fully hands-off verification pipelines.

Load-bearing premise

Coverage metrics alone give the LLM enough clear signal to improve the testbench without drifting into invalid code or requiring human fixes.

What would settle it

Run the loop on a fresh RTL design and check whether coverage stops rising or the generated code becomes invalid after a few iterations.

Figures

Figures reproduced from arXiv: 2504.19959 by Dingrong Pan, Jie Zhou, Junhao Ye, Ke Xu, Nan Guan, Qichun Chen, Shuai Zhao, Xinwei Fang, Xi Wang, Yuchen Hu, Zhe Jiang.

Figure 1
Figure 1. Figure 1: Breakdown of the IC frontend design and verification workflow, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the UVM2 Framework, which integrates UVM with LLM agents to automate the IC verification workflow. The framework includes Analysis Agent (AgentA) for test planning, Generation Agent (AgentG) for automatic testbench creation and error-driven regeneration, and Optimisation Agent (AgentO) for iterative testcase supplement based on coverage analysis. accelerating coverage achieved. • End-to-end int… view at source ↗
Figure 3
Figure 3. Figure 3: Prompt Instructions for AgentA. UVM2 breaks down the analysis into a structured reasoning pipeline that mimics expert verification engineers. As shown in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dependency-Driven UVM Testbench Generation Workflow with OtiitiAt Ptifl [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The “Top” module Template in the Testbench. This template fa [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Prompt Instructions for AgentG. generated by the top module using a template. After extracting the module name and port signal from the spec, the template will automatically fill in and generate the complete component. LLM-Based Generation. In contrast, components with complex behaviour, such as Driver, Monitor, and Scoreboard, require precise functional encoding tailored to specific testcase semantics and… view at source ↗
Figure 7
Figure 7. Figure 7: Testcase Supplement Workflow with Coverage Analysis. [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Prompt Instructions for AgentO. tackle this issue, we introduce a testcase optimisation mechanism that enhances stimulus generation through sequence refinement, focusing on the protocol-level behaviours defined within UVM. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: SRG of 11 UVM components with UVM2 against SOTA LLMs RQ2: How does the verification completeness achieved by UVM2 compare to existing LLM-based verification approaches? This question investigates whether UVM2 can generate testbenches that reach comparable or higher code and functional testcase coverage than other LLM-driven methods. RQ3: How much of a performance gain in terms of efficiency can end-to-end … view at source ↗
Figure 10
Figure 10. Figure 10: Four categories of errors in LLM-generated UVM components and their corrections. [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Code Coverage and Function Coverage of UVM [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Coverage improvement via testcase supplement [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
read the original abstract

Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise from the considerable manual coding effort required, repetitive manual execution of multiple EDA tools, and the need for in-depth domain expertise to navigate complex designs.Here, we present UVM^2, an automated verification framework that leverages Large Language Models (LLMs) to generate UVM testbenches and iteratively refine them using coverage feedback, significantly reducing manual effort while maintaining rigorous verification standards.To evaluate UVM^2, we introduce a benchmark suite comprising Register Transfer Level (RTL) designs of up to 1.6K lines of code.The results show that UVM^2 reduces testbench setup time by up to UVM^2 compared to experienced engineers, and achieve average code and function coverage of 87.44% and 89.58%, outperforming state-of-the-art solutions by 20.96% and 23.51%, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces UVM^2, an automated framework that employs large language models to generate UVM testbenches for RTL designs and iteratively refines them using coverage feedback. It presents a new benchmark suite of RTL designs up to 1.6k LOC and reports that UVM^2 reduces testbench setup time by up to UVM^2 relative to experienced engineers while achieving average code coverage of 87.44% and functional coverage of 89.58%, outperforming state-of-the-art methods by 20.96% and 23.51% respectively.

Significance. If the reported coverage gains and time reductions hold under rigorous controls, the work could meaningfully alleviate the verification bottleneck that consumes ~70% of IC development effort. The introduction of a dedicated benchmark suite is a constructive contribution that enables future comparisons, though the framework's dependence on external LLM calls and coverage-driven iteration must be shown to generalize beyond the evaluated designs.

major comments (3)
  1. [Abstract and §4] Abstract and §4: The iterative refinement loop is presented as relying solely on coverage metrics to produce progressively better, syntactically valid UVM testbenches, yet no details are supplied on prompt templates, handling of invalid LLM outputs, or safeguards against over-constrained sequences or hallucinated components. This assumption is load-bearing for the claimed 87.44%/89.58% coverage figures and the 20.96%/23.51% outperformance.
  2. [§5] §5 (Experimental Setup): The comparison against experienced engineers does not specify the engineers' experience level, the precise tasks timed, measurement protocol, or controls for post-hoc adjustments and benchmark selection. Without these, the reported setup-time reduction cannot be evaluated as a reproducible result.
  3. [§5.2] §5.2 (Benchmark Suite): The new RTL benchmark suite (designs ≤1.6k LOC) is introduced without disclosure of selection criteria, public availability, or verification that the designs exercise realistic stimulus-generation and interface challenges rather than permitting coverage inflation on narrow cases.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'up to UVM^2' for the time reduction appears to be a placeholder or typographical error and should be replaced by a concrete numerical factor.
  2. [Abstract] Notation: The framework name UVM^2 is used both for the system and in the time-reduction claim, creating unnecessary ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the insightful comments and recommendations. We address each of the major comments in detail below, indicating the revisions we plan to make to enhance the manuscript's clarity, reproducibility, and rigor.

read point-by-point responses
  1. Referee: [Abstract and §4] The iterative refinement loop is presented as relying solely on coverage metrics to produce progressively better, syntactically valid UVM testbenches, yet no details are supplied on prompt templates, handling of invalid LLM outputs, or safeguards against over-constrained sequences or hallucinated components. This assumption is load-bearing for the claimed 87.44%/89.58% coverage figures and the 20.96%/23.51% outperformance.

    Authors: We agree that additional details on the iterative refinement process are necessary to fully substantiate our results. In the revised manuscript, we will expand Section 4 to include the specific prompt templates employed for generating and refining UVM testbenches. We will also describe our approach to handling invalid LLM outputs, such as syntax error detection and iterative prompting for corrections. Furthermore, we will detail safeguards implemented to mitigate over-constrained sequences and potential hallucinations, including validation checks and component verification steps. These enhancements will provide greater transparency and support the reported coverage metrics. revision: yes

  2. Referee: [§5] The comparison against experienced engineers does not specify the engineers' experience level, the precise tasks timed, measurement protocol, or controls for post-hoc adjustments and benchmark selection. Without these, the reported setup-time reduction cannot be evaluated as a reproducible result.

    Authors: We acknowledge the need for more precise documentation of the experimental comparison. In the revision to Section 5, we will provide details on the experience levels of the participating engineers, specifying their years of industry experience with UVM and RTL verification. We will clarify the precise tasks that were timed, including testbench creation and initial stimulus setup. The measurement protocol will be described, including the tools and methods used for timing. Additionally, we will outline the controls employed, such as the use of identical benchmark designs and procedures to prevent post-hoc adjustments or selection bias. This will allow for better evaluation of the time reduction claims. revision: yes

  3. Referee: [§5.2] The new RTL benchmark suite (designs ≤1.6k LOC) is introduced without disclosure of selection criteria, public availability, or verification that the designs exercise realistic stimulus-generation and interface challenges rather than permitting coverage inflation on narrow cases.

    Authors: We recognize the importance of detailing the benchmark suite for reproducibility and validity. In the revised Section 5.2, we will disclose the selection criteria used for the RTL designs, emphasizing diversity in size, functionality, and complexity up to 1.6k LOC. We commit to making the benchmark suite publicly available, for example via a GitHub repository, upon publication. We will also include a discussion or additional analysis verifying that the designs incorporate realistic stimulus-generation requirements and interface challenges, thereby demonstrating that the coverage results are not due to narrow or inflated cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results are measured outcomes on external benchmarks

full rationale

The paper introduces UVM^2 as an LLM-driven framework that generates and refines UVM testbenches via coverage feedback, then reports measured performance (setup time reductions, 87.44% code coverage, 89.58% functional coverage) on a newly created benchmark suite of RTL designs up to 1.6k LOC. These quantities are direct experimental outputs from running the system against external LLMs and EDA tools rather than quantities defined in terms of the paper's own fitted parameters or self-referential equations. No self-definitional steps, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the abstract or described evaluation chain; the results remain falsifiable against independent benchmarks and do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on two domain assumptions about LLM behavior and the informativeness of coverage metrics; no numerical free parameters or new physical entities are introduced.

axioms (2)
  • domain assumption Large language models can be prompted to produce syntactically correct and functionally useful UVM testbench code for given RTL designs
    Invoked when the framework delegates testbench creation to the LLM.
  • domain assumption Coverage feedback supplies sufficient guidance for the LLM to iteratively improve testbench quality without external human fixes
    Required for the closed-loop refinement process described in the abstract.
invented entities (1)
  • UVM^2 framework no independent evidence
    purpose: Automated LLM-aided system for generating and refining UVM testbenches
    The proposed end-to-end verification machine itself.

pith-pipeline@v0.9.0 · 5769 in / 1479 out tokens · 44295 ms · 2026-05-22T17:53:58.820816+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification

    cs.AR 2026-05 unverdicted novelty 7.0

    UVMarvel automatically constructs subsystem-level UVM testbenches for mainstream bus protocols using LLMs, an IR, and supporting libraries, reaching 95.65% average code coverage in 4.5 hours of automated runtime.

  2. HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs

    cs.AR 2026-04 unverdicted novelty 7.0

    HAVEN combines LLM agents for planning and gap analysis with protocol-specific templates and a custom DSL to generate correct UVM testbenches, achieving 100% compilation success, 90.6% code coverage, and 87.9% functio...

  3. Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

    cs.AR 2026-04 unverdicted novelty 6.0

    Spec2Cov uses an LLM agent in a feedback loop with a hardware simulator to generate tests from specs, achieving 100% coverage on simple designs and up to 49% on complex ones across 26 benchmarks.

  4. Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification

    cs.AR 2026-04 unverdicted novelty 5.0

    Domain-specialized LLM agents for hardware verification close 95-99% coverage using 4-13x fewer tokens and 2-4x faster convergence than general-purpose agents by reallocating tokens toward coverage-directed reasoning.

  5. Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

    cs.AR 2026-04 unverdicted novelty 5.0

    Spec2Cov uses an LLM-simulator feedback loop to generate tests from specs, reaching 100% coverage on simple designs and up to 49% on complex ones across 26 benchmarks.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 4 Pith papers · 3 internal anchors

  1. [1]

    Are we there yet? a study on the state of high-level synthesis,

    S. Lahti, P. Sj ¨ovall, J. Vanne, and T. D. H ¨am¨al¨ainen, “Are we there yet? a study on the state of high-level synthesis,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 38, no. 5, pp. 898–911, 2018

  2. [2]

    Closing the verification gap with static sign-off,

    P. Ashar and V . Viswanath, “Closing the verification gap with static sign-off,” in 20th International Symposium on Quality Electronic Design (ISQED). IEEE, 2019, pp. 343–347

  3. [3]

    High performance machine learning models for functional verification of hardware designs,

    K. A. Ismail and M. A. Abd El Ghany, “High performance machine learning models for functional verification of hardware designs,” in 2021 3rd Novel Intelligent and Leading Emerging Sciences Conference (NILES). IEEE, 2021, pp. 15–18

  4. [4]

    Coverage fulfillment automation in hard- ware functional verification using genetic algorithms,

    G. M. Danciu and A. Dinu, “Coverage fulfillment automation in hard- ware functional verification using genetic algorithms,” Applied Sciences, vol. 12, no. 3, p. 1559, 2022

  5. [5]

    Machine learning in the service of hardware functional verification,

    R. Gal and A. Ziv, “Machine learning in the service of hardware functional verification,” in Machine Learning Applications in Electronic Design Automation. Springer, 2022, pp. 377–424

  6. [6]

    J. L. Hennessy and D. A. Patterson, Computer architecture: a quantita- tive approach. Elsevier, 2011

  7. [7]

    Harris and D

    S. Harris and D. Harris, Digital Design and Computer Architecture, RISC-V Edition. Morgan Kaufmann, 2021

  8. [8]

    A uvm-based smart functional verification platform: Concepts, pros, cons, and opportunities,

    K. Salah, “A uvm-based smart functional verification platform: Concepts, pros, cons, and opportunities,” in 2014 9th International Design and Test symposium (IDT). IEEE, 2014, pp. 94–99

  9. [9]

    Pragmatic approaches to implement self-checking mechanism in uvm based testbench,

    R. Madan, N. Kumar, and S. Deb, “Pragmatic approaches to implement self-checking mechanism in uvm based testbench,” in 2015 International Conference on Advances in Computer Engineering and Applications . IEEE, 2015, pp. 632–636

  10. [10]

    Uvm based testbench architecture for coverage driven functional verification of spi protocol,

    B. Vineeth and B. B. T. Sundari, “Uvm based testbench architecture for coverage driven functional verification of spi protocol,” in 2018 International conference on advances in computing, communications and informatics (ICACCI). IEEE, 2018, pp. 307–310

  11. [11]

    Beyond uvm for practical soc verification,

    Y .-N. Yun, J.-B. Kim, N.-D. Kim, and B. Min, “Beyond uvm for practical soc verification,” in 2011 International SoC Design Conference . IEEE, 2011, pp. 158–162

  12. [12]

    Simplified stimuli generation for scenario and assertion based verification,

    L. Piccolboni and G. Pravadelli, “Simplified stimuli generation for scenario and assertion based verification,” in 2014 15th Latin American Test Workshop-LATW. IEEE, 2014, pp. 1–6

  13. [13]

    Uvm-based verification of ecc module for flash memories,

    G. Visalli, “Uvm-based verification of ecc module for flash memories,” in 2017 European Conference on Circuit Theory and Design (ECCTD) . IEEE, 2017, pp. 1–4

  14. [14]

    Portable stimulus driven systemverilog/uvm verification environment for the verification of a high-capacity ethernet communication endpoint,

    A. Vintila, I. Tolea, H. Du, and Q. Gong, “Portable stimulus driven systemverilog/uvm verification environment for the verification of a high-capacity ethernet communication endpoint,” in Proceedings of the 2018 DVCON Conference and Exhibition Europe, Munich, Germany , 2018, pp. 24–25

  15. [15]

    If systemverilog is so good, why do we need the uvm? sharing responsibilities between libraries and the core language,

    J. Bromley, “If systemverilog is so good, why do we need the uvm? sharing responsibilities between libraries and the core language,” in Proceedings of the 2013 Forum on specification and Design Languages (FDL). IEEE, 2013, pp. 1–7

  16. [16]

    Uvm based testbench architecture for unit verification,

    J. Francesconi, J. A. Rodriguez, and P. M. Julian, “Uvm based testbench architecture for unit verification,” in 2014 Argentine Conference on Micro-Nanoelectronics, Technology and Applications. IEEE, 2014, pp. 89–94

  17. [17]

    Case study: Uvm-fie: Enhancing uvm-based fault injection library for complex designs,

    L. S. Tavares, W. J. Chau, and F. J. Fonseca, “Case study: Uvm-fie: Enhancing uvm-based fault injection library for complex designs,” in 2025 IEEE 26th Latin American Test Symposium . IEEE, 2025

  18. [18]

    Uvm based testbench architecture for logic sub-system verification,

    T. Pavithran and R. Bhakthavatchalu, “Uvm based testbench architecture for logic sub-system verification,” in 2017 International Conference on Technological Advancements in Power and Energy (TAP Energy). IEEE, 2017, pp. 1–5

  19. [19]

    Design and verification process of combinational adder using uvm methodology,

    M. Dharani, M. Bharathi, N. Padmaja, and K. Praveena, “Design and verification process of combinational adder using uvm methodology,” in 2023 International Conference on Advances in Electronics, Communica- tion, Computing and Intelligent Information Systems (ICAECIS) . IEEE, 2023, pp. 359–362

  20. [20]

    Robust serial driver verification through uvm framework,

    P. S. Kumar, R. Rajalakshmi, N. H. Kumar, B. P. Gupta, C. H. Nikhilesh, and J. S. P. Pavan, “Robust serial driver verification through uvm framework,” in 2024 Control Instrumentation System Conference (CISCON). IEEE, 2024, pp. 1–6

  21. [21]

    Modified condition decision coverage: A hardware verification perspective,

    M. A. Salem and K. I. Eder, “Modified condition decision coverage: A hardware verification perspective,” in 2013 14th International Workshop on Microprocessor Test and Verification. IEEE, 2013, pp. 8–13

  22. [22]

    An uvm-based verification platform for hardware and software co-design,

    S. Wu, K. Zhao, X. Wang, S. He, and D. Guo, “An uvm-based verification platform for hardware and software co-design,” in 2023 IEEE 17th International Conference on Anti-counterfeiting, Security, and Identification (ASID). IEEE, 2023, pp. 21–24

  23. [23]

    Uvm methodology: Industry-specific applications in modern hardware verification,

    K. V . Reddy, “Uvm methodology: Industry-specific applications in modern hardware verification,” International Journal of Computer En- gineering and Technology (IJCET) , vol. 15, no. 6, pp. 20–32, 2024

  24. [24]

    Verigen: A large language model for verilog code generation,

    S. Thakur, B. Ahmad, H. Pearce, B. Tan, B. Dolan-Gavitt, R. Karri, and S. Garg, “Verigen: A large language model for verilog code generation,” arXiv preprint arXiv:2308.00708 , 2023

  25. [25]

    Chip-chat: Challenges and opportunities in conversational hardware design,

    J. Blocklove, S. Garg, R. Karri, and H. Pearce, “Chip-chat: Challenges and opportunities in conversational hardware design,” arXiv preprint arXiv:2305.13243, 2023

  26. [26]

    Rtlfixer: Automatically fixing rtl syntax errors with large language models,

    Y . Tsai, M. Liu, and H. Ren, “Rtlfixer: Automatically fixing rtl syntax errors with large language models,” arXiv preprint arXiv:2311.16543 , 2023

  27. [27]

    Fixing hardware security bugs with large language models,

    B. Ahmad, S. Thakur, B. Tan, R. Karri, and H. Pearce, “Fixing hardware security bugs with large language models,” arXiv preprint arXiv:2302.01215, 2023

  28. [28]

    Llm4sechw: Leveraging domain-specific large language model for hardware debug- ging,

    W. Fu, K. Yang, R. G. Dutta, X. Guo, and G. Qu, “Llm4sechw: Leveraging domain-specific large language model for hardware debug- ging,” in 2023 Asian Hardware Oriented Security and Trust Symposium (AsianHOST). IEEE, 2023, pp. 1–6

  29. [29]

    Make every move count: Llm-based high-quality rtl code generation using mcts,

    M. DeLorenzo, A. B. Chowdhury, V . Gohil, S. Thakur, R. Karri, S. Garg, and J. Rajendran, “Make every move count: Llm-based high-quality rtl code generation using mcts,” arXiv preprint arXiv:2402.03289 , 2024

  30. [30]

    Domain- adapted llms for vlsi design and verification: A case study on formal verification,

    M. Liu, M. Kang, G. B. Hamad, S. Suhaib, and H. Ren, “Domain- adapted llms for vlsi design and verification: A case study on formal verification,” in 2024 IEEE 42nd VLSI Test Symposium (VTS) . IEEE, 2024, pp. 1–4

  31. [31]

    Hdldebugger: Streamlining hdl debugging with large language models,

    X. Yao, H. Li, T. H. Chan, W. Xiao, M. Yuan, Y . Huang, L. Chen, and B. Yu, “Hdldebugger: Streamlining hdl debugging with large language models,” arXiv preprint arXiv:2403.11671 , 2024

  32. [32]

    Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms,

    W. Fang, M. Li, M. Li, Z. Yan, S. Liu, H. Zhang, and Z. Xie, “Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms,” arXiv preprint arXiv:2402.00386 , 2024

  33. [33]

    Location is key: Leveraging large language model for functional bug localization in verilog,

    B. Yao, N. Wang, J. Zhou, X. Wang, H. Gao, Z. Jiang, and N. Guan, “Location is key: Leveraging large language model for functional bug localization in verilog,” arXiv preprint arXiv:2409.15186 , 2024

  34. [34]

    Insights from rights and wrongs: A large language model for solving assertion failures in rtl design,

    J. Zhou, Y . Ji, N. Wang, Y . Hu, X. Jiao, B. Yao, X. Fang, S. Zhao, N. Guan, and Z. Jiang, “Insights from rights and wrongs: A large language model for solving assertion failures in rtl design,” arXiv preprint arXiv:2503.04057, 2025

  35. [35]

    Meic: Re-thinking rtl debug automation using llms,

    K. Xu, J. Sun, Y . Hu, X. Fang, W. Shan, X. Wang, and Z. Jiang, “Meic: Re-thinking rtl debug automation using llms,” in Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design , ser. ICCAD ’24. New York, NY , USA: Association for Computing Machinery, 2025. [Online]. Available: https://doi.org/10.1145/3676536.3676801

  36. [36]

    Uvllm: An automated universal rtl verification framework using llms,

    Y . Hu, J. Ye, K. Xu, J. Sun, S. Zhang, X. Jiao, D. Pan, J. Zhou, N. Wang, W. Shan et al., “Uvllm: An automated universal rtl verification framework using llms,” arXiv preprint arXiv:2411.16238 , 2024

  37. [37]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. El- nashar, J. Spencer-Smith, and D. C. Schmidt, “A prompt pattern catalog to enhance prompt engineering with chatgpt,” arXiv preprint arXiv:2302.11382, 2023

  38. [38]

    Verilogeval: Evaluating large language models for verilog code generation,

    M. Liu, N. Pinckney, B. Khailany, and H. Ren, “Verilogeval: Evaluating large language models for verilog code generation,” in 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) . IEEE, 2023, pp. 1–8

  39. [39]

    Chipgpt: How far are we from natural language hardware design,

    K. Chang, Y . Wang, H. Ren, M. Wang, S. Liang, Y . Han, H. Li, and X. Li, “Chipgpt: How far are we from natural language hardware design,” arXiv preprint arXiv:2305.14019 , 2023

  40. [40]

    A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

    P. Sahoo, A. K. Singh, S. Saha, V . Jain, S. Mondal, and A. Chadha, “A systematic survey of prompt engineering in large language models: Techniques and applications,” arXiv preprint arXiv:2402.07927 , 2024

  41. [41]

    Hallucinations in llms: Understanding and addressing challenges,

    G. Perkovi ´c, A. Drobnjak, and I. Boti ˇcki, “Hallucinations in llms: Understanding and addressing challenges,” in 2024 47th MIPRO ICT and Electronics Convention (MIPRO) . IEEE, 2024, pp. 2084–2088

  42. [42]

    Hallucinations in large language models (llms),

    G. P. Reddy, Y . P. Kumar, and K. P. Prakash, “Hallucinations in large language models (llms),” in 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream) . IEEE, 2024, pp. 1–6

  43. [43]

    A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

    S. Tonmoy, S. Zaman, V . Jain, A. Rani, V . Rawte, A. Chadha, and A. Das, “A comprehensive survey of hallucination mitigation techniques in large language models,” arXiv preprint arXiv:2401.01313 , 2024

  44. [44]

    Rtllm: An open-source benchmark for design rtl generation with large language model,

    Y . Lu, S. Liu, Q. Zhang, and Z. Xie, “Rtllm: An open-source benchmark for design rtl generation with large language model,” arXiv preprint arXiv:2308.05345, 2023