arxiv: 2604.08932 · v1 · submitted 2026-04-10 · 💻 cs.AR

Recognition: unknown

From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation

Yonghao Wang , Hongqin Lyu , Boling Chen , MinYang Bao , Wenchao Ding , Feng Gu , Zhiteng Chao , Jianan Mu

show 3 more authors

Kan Shi Tiancheng Wang Huawei Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:22 UTC · model grok-4.3

classification 💻 cs.AR

keywords RTL verificationassertion-based verificationLLM assertion generationsemantic graphskey signal identificationtargeted verificationfunctional coveragemutation testing

0 comments

The pith

AgileAssert uses RTL semantic graphs and hybrid scoring to identify critical signals, then slices context so LLMs generate targeted assertions that cut total count by two thirds while holding or raising coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to replace indiscriminate assertion generation with a focused approach that first locates the signals whose values most control design behavior and error spread. It builds a semantic graph of the RTL, scores signals by a combination of functional importance and propagation potential, keeps only the top signals, and slices the surrounding code structure to give the LLM just the relevant context. This produces fewer assertions that still verify the key parts of the design. The shift matters because verification dominates IC development time and cost, so any method that delivers equivalent or better checks with substantially less output and input effort can reduce both engineering hours and compute resources.

Core claim

AgileAssert constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification.

What carries the argument

RTL semantic graphs with hybrid scoring and selection to pick top-K functionally critical signals, followed by structure-aware slicing that supplies the LLM with focused context for targeted assertion generation.

If this is right

On evaluated block- and CPU-level designs the approach reduces the number of assertions by an average of 66.68 percent compared with three existing state-of-the-art methods.
Coverage metrics improve while the tokens fed to the LLM drop by 64 percent.
In mutation testing the method exceeds prior approaches in error detection rate even though the average number of assertions falls by 72.74 percent.
Verification effort moves from blanket coverage to cost-effective targeted coverage that still exercises the design's core functionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-plus-scoring step could be reused with other assertion languages or formal engines that accept signal-focused constraints.
Lower token consumption opens the method to designs too large for full-context prompting without truncation.
Targeted assertions may prove easier to review and maintain because each one is tied to a documented high-impact signal.
Combining the sliced context with static timing or power analysis could further prioritize signals that affect both correctness and physical constraints.

Load-bearing premise

The hybrid scoring mechanism on the semantic graphs reliably identifies signals with the greatest impact on functionality and error propagation.

What would settle it

Run the method on a fresh block- or CPU-level design and measure whether the generated assertions achieve at least the same functional coverage and mutation error detection as full indiscriminate generation; if coverage falls or critical bugs are missed while assertion count drops, the claim does not hold.

Figures

Figures reproduced from arXiv: 2604.08932 by Boling Chen, Feng Gu, Hongqin Lyu, Huawei Li, Jianan Mu, Kan Shi, MinYang Bao, Tiancheng Wang, Wenchao Ding, Yonghao Wang, Zhiteng Chao.

**Figure 2.** Figure 2: The workflow of the AgileAssert framework. Our RTL parser constructs a semantic graph from the RTL code. Signals [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example of a simplified RTL semantic graph from an [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Example prompt for targeted assertion genera [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Signal-centric RTL slicing and prompt-driven assertion [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: COI coverage vs. input token consumption across [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Ablation Study on COI* Coverage Using Different Signal Ranking Configurations [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

read the original abstract

Functional verification has become the most time-consuming phase in IC development, and Assertion-Based Verification (ABV) is key to reducing debugging time. However, existing LLM-based assertion generation methods typically pursue indiscriminate verification, aiming for maximal coverage without considering signal criticality, whereas industrial practice demands maximizing coverage with minimal verification cost. Consequently, identifying signals that have the greatest impact on design functionality and error propagation-enabling a shift from indiscriminate to targeted verification-remains a key challenge. To address this, we propose AgileAssert, a key signal-driven assertion generation framework that constructs RTL semantic graphs and identifies the top-K critical signals via a hybrid scoring and selection mechanism, followed by structure-aware RTL slicing to provide the LLM with precise targets and contextual information, thereby guiding LLMs to generate tightly constrained targeted assertions for efficient verification. Evaluated on block- and CPU-level designs, with an average 66.68% reduction in assertions, our approach outperforms three existing SOTA methods, and significantly improving coverage metrics while reducing input token consumption by 64%. In mutation testing, when our approach surpasses existing methods in error detection rate, the average number of assertions used decreases by 72.74%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgileAssert adds a graph-based key-signal filter to LLM assertion generation for RTL, claiming big cuts in volume and tokens, but the supporting experiments need more transparency.

read the letter

The core idea is a pipeline that builds an RTL semantic graph, scores signals with a hybrid mechanism, slices the design around the top-K critical ones, and feeds that focused context to an LLM for assertion generation. This shifts from blanket coverage to targeted verification, which matches what industrial flows actually need: fewer assertions that still catch errors without blowing up compute or token budgets. On the designs they tested, the numbers are 66.68% fewer assertions on average, 64% lower token use, and better coverage plus mutation detection than three prior SOTA methods. In the mutation cases where it beats the baselines on error detection, it uses 72.74% fewer assertions. That combination of graph construction, hybrid scoring, and structure-aware slicing is the concrete new piece; it is not just another prompt-engineering trick on top of existing LLM assertion work. The approach is practical and directly addresses cost in ABV, which is useful for anyone who has watched assertion counts explode in large blocks or CPU cores. The main weakness is the thin experimental reporting. The abstract gives the headline percentages but supplies no description of the exact baselines, the designs beyond “block- and CPU-level,” how faults were injected for mutation testing, or any ablation on the scoring function itself. Without those, it is hard to judge whether the hybrid scorer actually ranks signals by functional impact or error propagation, or whether the gains simply come from feeding the LLM less context. The concern that the scoring step may not reliably isolate high-value signals therefore lands; the paper would be stronger with sensitivity checks on K and correlation between scores and actual verification utility. This work is aimed at hardware verification researchers and EDA tool developers who already use or evaluate LLMs for assertion generation. A reader in that group can extract the targeted-slicing idea and try it even if the current numbers need re-checking. It is coherent enough on its own terms to deserve a serious referee, who would likely press for the missing methodology details and ablations. I would send it to review rather than desk-reject, but with a clear request for the full experimental protocol and validation of the key-signal step.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces AgileAssert, a framework for targeted assertion generation in RTL verification. It builds RTL semantic graphs, applies a hybrid scoring mechanism to select top-K critical signals based on functional impact and error propagation, performs structure-aware slicing to provide focused context, and uses LLMs to generate efficient assertions. On block- and CPU-level designs, it reports an average 66.68% reduction in assertions and 64% fewer input tokens versus three SOTA methods, with improved coverage metrics and, in mutation testing, a 72.74% reduction in assertions while achieving higher error detection rates.

Significance. If the results hold under rigorous validation, the work could meaningfully advance assertion-based verification by reducing the cost of generating and maintaining assertions without sacrificing effectiveness. The graph-driven targeting of signals addresses a practical gap between maximal-coverage LLM methods and industrial constraints on verification overhead.

major comments (2)

§5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.
§3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.

minor comments (1)

Abstract: The phrase 'significantly improving coverage metrics' is vague; specific metrics (e.g., line, branch, or functional coverage) and quantitative deltas should be stated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and will revise the manuscript to strengthen the experimental reporting and analysis of the hybrid scoring mechanism.

read point-by-point responses

Referee: §5 (Experimental Evaluation): The abstract and results claim concrete gains (66.68% assertion reduction, 64% token reduction, superior coverage and mutation scores) but supply no description of experimental setup, baseline reproduction details, number of designs/runs, statistical significance tests, or error bars; without these the outperformance claims cannot be assessed.

Authors: We agree that additional experimental details are needed for full reproducibility and assessment of the claims. In the revised manuscript we will expand §5 to include: a complete list of the block-level and CPU-level designs with their characteristics, explicit reproduction steps and hyperparameters for the three SOTA baselines, the number of independent runs performed, statistical significance tests (e.g., paired t-tests with p-values), and error bars or standard deviations on all reported metrics including assertion counts, token usage, coverage, and mutation scores. revision: yes
Referee: §3.2 (Hybrid Scoring and Signal Selection): The hybrid scoring mechanism (graph centrality + fan-in/out + semantic features) is load-bearing for the central claim that targeted slicing produces better assertions than indiscriminate generation; however, no ablation studies, sensitivity analysis on K, or correlation of scores with injected faults/mutation outcomes are reported to show the scoring actually isolates high-impact signals rather than simply reducing context size.

Authors: We acknowledge that dedicated ablations would more directly isolate the contribution of the hybrid scoring. While the end-to-end comparisons already show gains in coverage and mutation detection, we will add the requested analyses in the revision: (1) component-wise ablations of the hybrid score, (2) sensitivity curves for different values of K reporting effects on assertion count, token usage, coverage, and error detection, and (3) correlation plots or tables linking per-signal hybrid scores to mutation outcomes. These additions will demonstrate that high-impact signals are preferentially selected rather than the benefit arising solely from smaller context. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external SOTA comparisons, not internal derivations or self-referential fits

full rationale

The provided abstract and context contain no equations, derivations, or first-principles claims that reduce to their own inputs. The framework (RTL semantic graphs, hybrid scoring for top-K signals, structure-aware slicing, LLM assertion generation) is described procedurally, with performance quantified solely via empirical evaluation on block- and CPU-level designs against three external SOTA baselines. Metrics such as 66.68% assertion reduction, 64% token reduction, and mutation-testing gains are presented as measured outcomes rather than predictions forced by fitting or self-definition. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results appears. The derivation chain is therefore self-contained against external benchmarks, yielding a normal non-finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies insufficient detail to enumerate free parameters, axioms, or invented entities with precision; the top-K selection and hybrid scoring are implied but not quantified or justified.

free parameters (1)

K (number of top critical signals)
Central parameter in the hybrid scoring and selection mechanism; its value and selection criteria are not numerically specified.

axioms (1)

domain assumption RTL semantic graphs accurately capture design functionality and error propagation paths
Invoked when constructing graphs to identify critical signals.

pith-pipeline@v0.9.0 · 5543 in / 1540 out tokens · 80440 ms · 2026-05-10T17:22:25.761426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 7 canonical work pages

[1]

https://iwls.org/iwls2005/benchmarks.html

Iwls 2005 benchmaeks. https://iwls.org/iwls2005/benchmarks.html

2005
[2]

https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml

Jaspergold: the next generation. https://community.cadence.com/ cadence blogs 8/b/breakfast-bytes/posts/jasper-ml
[3]

https://github.com/liangkangnan/tinyriscv

tinyriscv. https://github.com/liangkangnan/tinyriscv
[4]

Focs–automatic generation of simulation checkers from formal specifications

Yael Abarbanel, Ilan Beer, Leonid Gluhovsky, Sharon Keidar, and Yaron Wolfsthal. Focs–automatic generation of simulation checkers from formal specifications. InComputer Aided Verification: 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19,

2000
[5]

Springer, 2000

Proceedings 12, pages 538–542. Springer, 2000

2000
[6]

A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019

Ayaz Akram and Lina Sawalha. A survey of computer architecture simulation techniques and tools.Ieee Access, 7:78120–78145, 2019

2019
[7]

Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl.arXiv preprint arXiv:2503.19174, 2025

Yunsheng Bai, Ghaith Bany Hamad, Syed Suhaib, and Haoxing Ren. Assertionforge: Enhancing formal verification assertion generation with structured representation of specifications and rtl.arXiv preprint arXiv:2503.19174, 2025

work page arXiv 2025
[8]

Betweenness centrality in large complex networks

Marc Barthelemy. Betweenness centrality in large complex networks. The European physical journal B, 38(2):163–168, 2004

2004
[9]

Improving design verifiability by early rtl coverability analysis

Kai-Hui Chang, Chia-Wei Chang, Jie-Hong Roland Jiang, and Chien- Nan Jimmy Liu. Improving design verifiability by early rtl coverability analysis. InTenth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMCODE2012), pages 25–32. IEEE, 2012

2012
[10]

LongRoPE: Extending LLM context window beyond 2 million tokens

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. Longrope: Extend- ing llm context window beyond 2 million tokens.arXiv preprint arXiv:2402.13753, 2024

work page arXiv 2024
[11]

AssertLLM: Generating and evaluating hardware verification asser- tions from design specifications via multi-LLMs,

Wenji Fang, Mengming Li, Min Li, Zhiyuan Yan, Shang Liu, Zhiyao Xie, and Hongce Zhang. Assertllm: Generating and evaluating hardware verification assertions from design specifications via multi-llms.arXiv preprint arXiv:2402.00386, 2024

work page arXiv 2024
[12]

Harry Foster. 2024 wilson research group ic/asic functional verification trend report.https://resources.sw.siemens.com/zh-TW/white-paper-2024- wilson-research-group-ic-asic-functional-verification-trend-report/

2024
[13]

Assertion-based verification: Industry myths to realities (invited tutorial)

Harry Foster. Assertion-based verification: Industry myths to realities (invited tutorial). InComputer Aided Verification: 20th International Conference, CAV 2008 Princeton, NJ, USA, July 7-14, 2008 Proceedings 20, pages 5–10. Springer, 2008

2008
[14]

Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015

David F Gleich. Pagerank beyond the web.siam REVIEW, 57(3):321– 363, 2015

2015
[15]

Realbench: Benchmarking verilog generation models with real-world ip designs.arXiv preprint arXiv:2507.16200,

Pengwei Jin, Di Huang, Chongxiao Li, Shuyao Cheng, Yang Zhao, Xinyao Zheng, Jiaguo Zhu, Shuyi Xing, Bohan Dou, Rui Zhang, et al. Realbench: Benchmarking verilog generation models with real-world ip designs.arXiv preprint arXiv:2507.16200, 2025

work page arXiv 2025
[16]

(security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024

Rahul Kande, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Shailja Thakur, Ramesh Karri, and Jeyavijayan Rajendran. (security) assertions by large language models.IEEE Transactions on Information Forensics and Security, 2024

2024
[17]

Ai-accelerating coverage closure using intelligent stimulus generation

Jainender Kumar, Ronak Bhatt, Garima Srivastava, Ashutosh Sinha, and Prashant Teotia. Ai-accelerating coverage closure using intelligent stimulus generation
[18]

Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging

Hongqin Lyu, Yonghao Wang, Yunlin Du, Mingyu Shi, Zhiteng Chao, Wenxing Li, Tiancheng Wang, and Huawei Li. Assertgen: Enhancement of llm-aided assertion generation through cross-layer signal bridging. In 31th Asia and South Pacific Design Automation Conference. IEEE, 2026

2026
[19]

Assertminer: Module-level spec generation and assertion mining using static analysis guided llms

Hongqin Lyu, Yonghao Wang, Jiaxin Zhou, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Assertminer: Module-level spec generation and assertion mining using static analysis guided llms. In2026 31st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 432– 438, 2026

2026
[20]

Using of jaccard coefficient for keywords similarity

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. Using of jaccard coefficient for keywords similarity. InProceedings of the international multiconference of engineers and computer scientists, volume 1, pages 380–384, 2013

2013
[21]

Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag

Hafiz Abdul Quddus, Md Sanowar Hossain, Ziya Cevahir, Alexander Jesser, and Md Nur Amin. Enhanced vlsi assertion generation: Con- forming to high-level specifications and reducing llm hallucinations with rag. InDVCon Europe 2024; Design and Verification Conference and Exhibition Europe, pages 57–62. VDE, 2024

2024
[22]

Spotsigs: robust and efficient near duplicate detection in large web collections

Martin Theobald, Jonathan Siddharth, and Andreas Paepcke. Spotsigs: robust and efficient near duplicate detection in large web collections. InProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 563–570, 2008

2008
[23]

Deepassert: An llm-aided verification framework with fine-grained assertion generation for modules with extracted module specifications.arXiv preprint arXiv:2509.14668, 2025

Yonghao Wang, Jiaxin Zhou, Hongqin Lyu, Zhiteng Chao, Tiancheng Wang, and Huawei Li. Deepassert: An llm-aided verification framework with fine-grained assertion generation for modules with extracted module specifications.arXiv preprint arXiv:2509.14668, 2025

work page arXiv 2025
[24]

Iter- ative llm-based assertion generation using syntax-semantic represen- tations for functional coverage-guided verification.arXiv preprint arXiv:2602.15388, 2026

Yonghao Wang, Jiaxin Zhou, Yang Yin, Hongqin Lyu, Zhiteng Chao, Wenchao Ding, Jing Ye, Tiancheng Wang, and Huawei Li. Iter- ative llm-based assertion generation using syntax-semantic represen- tations for functional coverage-guided verification.arXiv preprint arXiv:2602.15388, 2026

work page arXiv 2026
[25]

A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022

Hasini Witharana, Yangdi Lyu, Subodha Charles, and Prabhat Mishra. A survey on assertion-based hardware verification.ACM Computing Surveys (CSUR), 54(11s):1–33, 2022

2022
[26]

Spec2Assertion: Automatic pre- RTL assertion generation using large language models with progressive regularization,

Fenghua Wu, Evan Pan, Rahul Kande, Michael Quinn, Aakash Tyagi, David Kebo, Jeyavijayan Rajendran, and Jiang Hu. Spec2assertion: Automatic pre-rtl assertion generation using large language models with progressive regularization.arXiv preprint arXiv:2505.07995, 2025

work page arXiv 2025
[27]

Picorv32: A size-optimized risc-v cpu core

YosysHQ. Picorv32: A size-optimized risc-v cpu core. https://github. com/YosysHQ/picorv32