Recognition: unknown
From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation
Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3
The pith
AgileAssert uses RTL semantic graphs and hybrid scoring to identify critical signals, then slices context so LLMs generate targeted assertions that cut total count by two thirds while holding or raising coverage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgileAssert constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification.
What carries the argument
RTL semantic graphs with hybrid scoring and selection to pick top-K functionally critical signals, followed by structure-aware slicing that supplies the LLM with focused context for targeted assertion generation.
If this is right
- On evaluated block- and CPU-level designs the approach reduces the number of assertions by an average of 66.68 percent compared with three existing state-of-the-art methods.
- Coverage metrics improve while the tokens fed to the LLM drop by 64 percent.
- In mutation testing the method exceeds prior approaches in error detection rate even though the average number of assertions falls by 72.74 percent.
- Verification effort moves from blanket coverage to cost-effective targeted coverage that still exercises the design's core functionality.
Where Pith is reading between the lines
- The same graph-plus-scoring step could be reused with other assertion languages or formal engines that accept signal-focused constraints.
- Lower token consumption opens the method to designs too large for full-context prompting without truncation.
- Targeted assertions may prove easier to review and maintain because each one is tied to a documented high-impact signal.
- Combining the sliced context with static timing or power analysis could further prioritize signals that affect both correctness and physical constraints.
Load-bearing premise
The hybrid scoring mechanism on the semantic graphs reliably identifies signals with the greatest impact on functionality and error propagation.
What would settle it
Run the method on a fresh block- or CPU-level design and measure whether the generated assertions achieve at least the same functional coverage and mutation error detection as full indiscriminate generation; if coverage falls or critical bugs are missed while assertion count drops, the claim does not hold.
Figures
read the original abstract
Functional verification has become the most time-consuming phase in IC development, and Assertion-Based Verification (ABV) is key to reducing debugging time. However, existing LLM-based assertion generation methods typically pursue indiscriminate verification, aiming for maximal coverage without considering signal criticality, whereas industrial practice demands maximizing coverage with minimal verification cost. Consequently, identifying signals that have the greatest impact on design functionality and error propagation-enabling a shift from indiscriminate to targeted verification-remains a key challenge. To address this, we propose AgileAssert, a key signal-driven assertion generation framework that constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification. Evaluated on block- and CPU-level designs, with an average 66.68% reduction in assertions, our approach outperforms three existing SOTA methods, and significantly improving coverage metrics while reducing input token consumption by 64%. In mutation testing, when our approach surpasses existing methods in error detection rate, the average number of assertions used decreases by 72.74%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces AgileAssert, a framework for targeted assertion generation in RTL verification. It builds RTL semantic graphs, applies a hybrid scoring mechanism to select top-K critical signals based on functional impact and error propagation, performs structure-aware slicing to provide focused context, and uses LLMs to generate efficient assertions. On block- and CPU-level designs, it reports an average 66.68% reduction in assertions and 64% fewer input tokens versus three SOTA methods, with improved coverage metrics and, in mutation testing, a 72.74% reduction in assertions while achieving higher error detection rates.
Significance. If the results hold under rigorous validation, the work could meaningfully advance assertion-based verification by reducing the cost of generating and maintaining assertions without sacrificing effectiveness. The graph-driven targeting of signals addresses a practical gap between maximal-coverage LLM methods and industrial constraints on verification overhead.
major comments (2)
- §5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.
- §3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.
minor comments (1)
- Abstract: The phrase 'significantly improving coverage metrics' is vague; specific metrics (e.g., line, branch, or functional coverage) and quantitative deltas should be stated.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the experimental reporting and analysis of the hybrid scoring mechanism.
read point-by-point responses
-
Referee: §5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.
Authors: We agree that additional experimental details are needed for full reproducibility and assessment of the claims. In the revised manuscript we will expand §5 to include: a complete list of the block-level and CPU-level designs with their characteristics, explicit reproduction steps and hyperparameters for the three SOTA baselines, the number of independent runs performed, statistical significance tests (e.g., paired t-tests with p-values), and error bars or standard deviations on all reported metrics including assertion counts, token usage, coverage, and mutation scores. revision: yes
-
Referee: §3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.
Authors: We acknowledge that dedicated ablations would more directly isolate the contribution of the hybrid scoring. While the end-to-end comparisons already show gains in coverage and mutation detection, we will add the requested analyses in the revision: (1) component-wise ablations of the hybrid score, (2) sensitivity curves for different values of K reporting effects on assertion count, token usage, coverage, and error detection, and (3) correlation plots or tables linking per-signal hybrid scores to mutation outcomes. These additions will demonstrate that high-impact signals are preferentially selected rather than the benefit arising solely from smaller context. revision: yes
Circularity Check
No circularity: empirical claims rest on external SOTA comparisons, not internal derivations or self-referential fits
full rationale
The provided abstract and context contain no equations, derivations, or first-principles claims that reduce to their own inputs. The framework (RTL semantic graphs, hybrid scoring for top-K signals, structure-aware slicing, LLM assertion generation) is described procedurally, with performance quantified solely via empirical evaluation on block- and CPU-level designs against three external SOTA baselines. Metrics such as 66.68% assertion reduction, 64% token reduction, and mutation-testing gains are presented as measured outcomes rather than predictions forced by fitting or self-definition. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results appears. The derivation chain is therefore self-contained against external benchmarks, yielding a normal non-finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- K (number of top critical signals)
axioms (1)
- domain assumption RTL semantic graphs accurately capture design functionality and error propagation paths
Reference graph
Works this paper leans on
-
[1]
https://iwls.org/iwls2005/benchmarks.html
Iwls 2005 benchmaeks. https://iwls.org/iwls2005/benchmarks.html
2005
-
[2]
https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml
Jaspergold: the next generation. https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml
-
[3]
https://github.com/liangkangnan/tinyriscv
tinyriscv. https://github.com/liangkangnan/tinyriscv
-
[4]
Focs–automatic generation of simulation checkers from formal specifications
Yael Abarbanel, Ilan Beer, Leonid Gluhovsky, Sharon Keidar, and Yaron Wolfsthal. Focs–automatic generation of simulation checkers from formal specifications. InComputer Aided Verification: 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19,
2000
-
[5]
Springer, 2000
Proceedings 12, pages 538–542. Springer, 2000
2000
-
[6]
A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019
Ayaz Akram and Lina Sawalha. A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019
2019
-
[7]
Yunsheng Bai, Ghaith Bany Hamad, Syed Suhaib, and Haoxing Ren. Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl.arXiv preprint arXiv:2503.19174, 2025
-
[8]
Betweenness centrality in large complex networks
Marc Barthelemy. Betweenness centrality in large complex networks. The European physical journal B, 38(2):163–168, 2004
2004
-
[9]
Improving design verifiability by early rtl coverability analysis
Kai-Hui Chang, Chia-Wei Chang, Jie-Hong Roland Jiang, and Chien- Nan Jimmy Liu. Improving design verifiability by early rtl coverability analysis. InTenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012), pages 25–32. IEEE, 2012
2012
-
[10]
LongRoPE: Extending LLM context window beyond 2 million tokens
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. Longrope: Extend- ing llm context window beyond 2 million tokens.arXiv preprint arXiv:2402.13753, 2024
-
[11]
Wenji Fang, Mengming Li, Min Li, Zhiyuan Yan, Shang Liu, Zhiyao Xie, and Hongce Zhang. Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms.arXiv preprint arXiv:2402.00386, 2024
-
[12]
Harry Foster. 2024 wilson research group ic/asic functional verification trend report.https://resources.sw.siemens.com/zh-TW/white-paper-2024- wilson-research-group-ic-asic-functional-verification-trend-report/
2024
-
[13]
Assertion-based verification: Industry myths to realities (invited tutorial)
Harry Foster. Assertion-based verification: Industry myths to realities (invited tutorial). InComputer Aided Verification: 20th International Conference, CAV 2008 Princeton, NJ, USA, July 7-14, 2008 Proceedings 20, pages 5–10. Springer, 2008
2008
-
[14]
Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015
David F Gleich. Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015
2015
-
[15]
Pengwei Jin, Di Huang, Chongxiao Li, Shuyao Cheng, Yang Zhao, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Bohan Dou, Rui Zhang, et al. Realbench: Benchmarking verilog generation models with real-world ip designs.arXiv preprint arXiv:2507.16200, 2025
-
[16]
(security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024
Rahul Kande, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Shailja Thakur, Ramesh Karri, and Jeyavijayan Rajendran. (security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024
2024
-
[17]
Ai-accelerating coverage closure using intelligent stimulus generation
Jainender Kumar, Ronak Bhatt, Garima Srivastava, Ashutosh Sinha, and Prashant Teotia. Ai-accelerating coverage closure using intelligent stimulus generation
-
[18]
Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging
Hongqin Lyu, Yonghao Wang, Yunlin Du, Mingyu Shi, Zhiteng Chao, Wenxing Li, Tiancheng Wang, and Huawei Li. Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging. In 31th Asia and South Pacific Design Automation Conference. IEEE, 2026
2026
-
[19]
Assertminer: Module-level spec generation and assertion mining using static analysis guided llms
Hongqin Lyu, Yonghao Wang, Jiaxin Zhou, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Assertminer: Module-level spec generation and assertion mining using static analysis guided llms. In2026 31st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 432– 438, 2026
2026
-
[20]
Using of jaccard coefficient for keywords similarity
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. Using of jaccard coefficient for keywords similarity. InProceedings of the international multiconference of engineers and computer scientists, volume 1, pages 380–384, 2013
2013
-
[21]
Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag
Hafiz Abdul Quddus, Md Sanowar Hossain, Ziya Cevahir, Alexander Jesser, and Md Nur Amin. Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag. InDVCon Europe 2024; Design and Verification Conference and Exhibition Europe, pages 57–62. VDE, 2024
2024
-
[22]
Spotsigs: robust and efficient near duplicate detection in large web collections
Martin Theobald, Jonathan Siddharth, and Andreas Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. InProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 563–570, 2008
2008
-
[23]
Yonghao Wang, Jiaxin Zhou, Hongqin Lyu, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Deepassert: An llm-aided verification framework with fine-grained assertion generation for modules with extracted module specifications.arXiv preprint arXiv:2509.14668, 2025
-
[24]
Yonghao Wang, Jiaxin Zhou, Yang Yin, Hongqin Lyu, Zhiteng Chao, Wenchao Ding, Jing Ye, Tiancheng Wang, and Huawei Li. Iter- ative llm-based assertion generation using syntax-semantic represen- tations for functional coverage-guided verification.arXiv preprint arXiv:2602.15388, 2026
-
[25]
A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022
Hasini Witharana, Yangdi Lyu, Subodha Charles, and Prabhat Mishra. A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022
2022
-
[26]
Fenghua Wu, Evan Pan, Rahul Kande, Michael Quinn, Aakash Tyagi, David Kebo, Jeyavijayan Rajendran, and Jiang Hu. Spec2assertion: Automatic pre-rtl assertion generation using large language models with progressive regularization.arXiv preprint arXiv:2505.07995, 2025
-
[27]
Picorv32: A size-optimized risc-v cpu core
YosysHQ. Picorv32: A size-optimized risc-v cpu core. https://github. com/YosysHQ/picorv32
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.