Recognition: unknown
VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog
Pith reviewed 2026-05-10 11:40 UTC · model grok-4.3
The pith
An embedding-based framework detects and localizes common vulnerabilities in Verilog code at the line level.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an embedding-based bug-detection framework detects and classifies bugs in Verilog code at module and line-level granularity. It achieves about 89 percent precision in identifying common CWEs such as CWE-1244 and CWE-1245, and 96 percent accuracy in detecting line-level bugs.
What carries the argument
The embedding-based bug-detection framework that turns Verilog code into vectors to capture semantic vulnerabilities and enable line-level localization.
If this is right
- Line-level localization becomes feasible for semantic bugs that evade rule-based checks.
- Module-level and line-level classification of specific CWEs is performed in one pass.
- Security review of LLM-generated RTL code gains a practical detection tool.
- Detection works on both module and individual line granularity.
Where Pith is reading between the lines
- The same embedding approach could be tested on other hardware description languages to check generality.
- Combining the embeddings with existing formal properties might raise overall detection rates.
- The method supplies a concrete baseline for comparing future line-level hardware vulnerability tools.
Load-bearing premise
Embedding vectors derived from Verilog code can reliably capture semantic vulnerabilities and enable precise line-level localization where rule-based and formal methods fail.
What would settle it
A held-out set of Verilog modules containing documented instances of CWE-1244 or CWE-1245 at known lines where the embedding method returns precision below 80 percent or line-level accuracy below 90 percent.
Figures
read the original abstract
Large Language Models (LLMs) have shown significant improvement in RTL code generation. Despite the advances, the generated code is often riddled with common vulnerabilities and weaknesses (CWEs) that can slip by untrained eyes. Attackers can often exploit these weaknesses to fulfill their nefarious motives. Existing RTL bug-detection techniques rely on rule-based checks, formal properties, or coarse-grained structural analysis, which either fail to capture semantic vulnerabilities or lack precise localization. In our work, we bridge this gap by proposing an embedding-based bug-detection framework that detects and classifies bugs at both module and line-level granularity. Our method achieves about 89% precision in identifying common CWEs such as CWE-1244 and CWE-1245, and 96% accuracy in detecting line-level bugs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces VeriCWEty, an embedding-enabled framework for detecting Common Weakness Enumerations (CWEs) in Verilog RTL code at both module and line-level granularity. It positions the approach as bridging gaps in rule-based, formal, and coarse-grained structural methods by using embeddings to capture semantic vulnerabilities, claiming approximately 89% precision on CWEs such as CWE-1244 and CWE-1245 along with 96% accuracy for line-level bug detection.
Significance. If the central claims hold under proper validation, the work could meaningfully advance hardware security tooling by offering semantic, localized CWE detection in Verilog where existing techniques are limited. The embedding-based line-level localization, if mechanistically sound, would represent a useful direction beyond module-level classification.
major comments (2)
- [Abstract] Abstract: performance metrics (89% precision, 96% accuracy) are stated without any description of the dataset, model architecture, training procedure, baselines, validation splits, or statistical significance. This absence makes it impossible to determine whether the numbers support the central claim of effective line-level CWE detection.
- [Method description] Method description: the embedding pipeline for line-level output is not specified (e.g., independent per-line embeddings, token-level classifiers, or post-hoc attribution). Without this mechanism or ablations on context window size, the reported 96% line-level accuracy risks reflecting module-level detection rather than true per-line localization, undermining the claimed advantage over rule-based tools.
minor comments (1)
- [Abstract] Abstract: the phrase 'common CWEs such as CWE-1244 and CWE-1245' is used without enumerating all evaluated CWEs or providing concrete bug examples.
Simulated Author's Rebuttal
We appreciate the referee's comments on the need for greater clarity in the abstract and method description. We respond to each point below, agreeing where revisions are needed to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: performance metrics (89% precision, 96% accuracy) are stated without any description of the dataset, model architecture, training procedure, baselines, validation splits, or statistical significance. This absence makes it impossible to determine whether the numbers support the central claim of effective line-level CWE detection.
Authors: We agree that the abstract does not provide sufficient context for the reported metrics. To address this, we will revise the abstract to include a concise description of the dataset used, the embedding model architecture, the training and validation procedures, and the baselines considered. This will allow readers to better assess the validity of the 89% precision and 96% accuracy claims. revision: yes
-
Referee: [Method description] Method description: the embedding pipeline for line-level output is not specified (e.g., independent per-line embeddings, token-level classifiers, or post-hoc attribution). Without this mechanism or ablations on context window size, the reported 96% line-level accuracy risks reflecting module-level detection rather than true per-line localization, undermining the claimed advantage over rule-based tools.
Authors: We acknowledge that the specific mechanism for generating line-level outputs from embeddings is not detailed in the manuscript. We will revise the method section to clearly specify the embedding pipeline, including how per-line classifications are obtained (e.g., via context-aware embeddings or attribution methods), and include ablations on context window sizes to confirm that the line-level accuracy reflects genuine localization rather than module-level effects. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper presents an empirical embedding-based ML framework for CWE detection in Verilog, reporting experimental precision and accuracy metrics on datasets. No mathematical derivations, equations, or self-referential predictions appear in the abstract or described approach. Claims rest on standard training of embeddings and classifiers rather than any tautological reduction of outputs to inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The method is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes that collapse the central result.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
A survey of LLM applications in secure hardware design covering EDA synthesis, vulnerability analysis, countermeasures, and educational uses.
-
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges
LLMs enable RTL code generation and vulnerability analysis in hardware design but introduce data contamination and adversarial risks that require red-teaming and dynamic benchmarking.
Reference graph
Works this paper leans on
-
[1]
A survey on hardware vulnerability analysis using machine learning,
Z. Pan and P. Mishra, “A survey on hardware vulnerability analysis using machine learning,”IEEE access, vol. 10, pp. 49508–49527, 2022
2022
-
[2]
Fixing hardware security bugs with large language models,
B. Ahmad, S. Thakur, B. Tan, R. Karri, and H. Pearce, “Fixing hardware security bugs with large language models,”arXiv preprint arXiv:2302.01215, 2023
-
[3]
A survey on assertion- based hardware verification,
H. Witharana, Y . Lyu, S. Charles, and P. Mishra, “A survey on assertion- based hardware verification,”ACM Computing Surveys (CSUR), vol. 54, no. 11s, pp. 1–33, 2022
2022
-
[4]
Directed test generation for hardware validation: A survey,
A. Jayasena and P. Mishra, “Directed test generation for hardware validation: A survey,”ACM Computing Surveys, vol. 56, no. 5, pp. 1–36, 2024
2024
-
[5]
Principles of verifiable rtl design,
L. Bening, “Principles of verifiable rtl design,” inPrinciples of Veri- fiable RTL Design: A Functional Coding Style Supporting Verification Processes in Verilog, pp. 239–245, Boston, MA, USA: Springer US, 2001
2001
-
[6]
Don’t cweat it: Toward cwe analysis techniques in early stages of hardware design,
B. Ahmad, W.-K. Liu, L. Collini, H. Pearce, J. M. Fung, J. Valamehr, M. Bidmeshki, P. Sapiecha, S. Brown, K. Chakrabarty, R. Karri, and B. Tan, “Don’t cweat it: Toward cwe analysis techniques in early stages of hardware design,” inProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD ’22, (New York, NY , USA), Associati...
2022
-
[7]
LASHED: LLMs and static hardware analysis for early detection of RTL bugs,
B. Ahmad, H. Pearce, R. Karri, and B. Tan, “Lashed: Llms and static hardware analysis for early detection of rtl bugs,”arXiv preprint arXiv:2504.21770, 2025
-
[8]
Veriloglavd: Llm-aided rule generation for vulnerability detection in verilog,
X. Long, Y . Xia, X. Chen, and L. Kuang, “Veriloglavd: Llm-aided rule generation for vulnerability detection in verilog,”arXiv preprint arXiv:2508.13092, 2025
-
[9]
Large language model for vulnerability detection: Emerging results and future directions,
X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerability detection: Emerging results and future directions,” inProceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineer- ing: New Ideas and Emerging Results, pp. 47–51, 2024
2024
-
[10]
TrojanLoC: Fine-grained hardware Trojan detection from Verilog code,
W. Xiao, Z. Wang, M. Shao, R. V . Hemadri, O. Sinanoglu, M. Shafique, J. Knechtel, S. Garg, and R. Karri, “Trojanloc: Llm-based framework for rtl trojan localization,”arXiv preprint arXiv:2512.00591, 2025
-
[11]
Veriloc: Line-of-code level prediction of hardware design quality from verilog code,
R. V . Hemadri, J. Bhandari, A. Nakkab, J. Knechtel, B. P. Gopalan, R. Narayanaswamy, R. Karri, and S. Garg, “Veriloc: Line-of-code level prediction of hardware design quality from verilog code,”arXiv preprint arXiv:2506.07239, 2025
-
[12]
Llm4sechw: Leveraging domain-specific large language model for hardware debug- ging,
W. Fu, K. Yang, R. G. Dutta, X. Guo, and G. Qu, “Llm4sechw: Leveraging domain-specific large language model for hardware debug- ging,” in2023 Asian hardware oriented security and trust symposium (AsianHOST), pp. 1–6, IEEE, 2023
2023
-
[13]
Llms and the future of chip design: Unveiling security risks and building trust,
Z. Wang, L. Alrahis, L. Mankali, J. Knechtel, and O. Sinanoglu, “Llms and the future of chip design: Unveiling security risks and building trust,” in2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 385–390, IEEE, 2024
2024
-
[14]
Llm-assisted bug identification and correction for verilog hdl,
K. Qayyum, C. K. Jha, S. Ahmadi-Pour, M. Hassan, and R. Drechsler, “Llm-assisted bug identification and correction for verilog hdl,”ACM Transactions on Design Automation of Electronic Systems, vol. 30, no. 6, pp. 1–28, 2025
2025
-
[15]
Common Weakness Enumeration: A Community- Developed List of Software and Hardware Weaknesses
MITRE Corporation, “Common Weakness Enumeration: A Community- Developed List of Software and Hardware Weaknesses.” https://cwe. mitre.org/index.html, 2026. Accessed: 18 March 2026
2026
-
[16]
Security properties for open-source hardware designs,
J. Rogers, N. Shakeel, D. Mankani, S. Espinosa, C. Chabra, K. Ryan, and C. Sturton, “Security properties for open-source hardware designs,” arXiv preprint arXiv:2412.08769, 2024
-
[17]
Hunting security bugs in soc designs: Lessons learned,
M. M. Bidmeshki, Y . Zhang, M. Zaman, L. Zhou, and Y . Makris, “Hunting security bugs in soc designs: Lessons learned,”IEEE Design & Test, vol. 38, no. 1, pp. 22–29, 2021
2021
-
[18]
Hardfails: insights into software-exploitable hardware bugs,
G. Dessouky, D. Gens, P. Haney, G. Persyn, A. Kanuparthi, H. Khattri, J. M. Fung, A.-R. Sadeghi, and J. Rajendran, “Hardfails: insights into software-exploitable hardware bugs,” inProceedings of the 28th USENIX Conference on Security Symposium, SEC’19, (USA), p. 213–230, USENIX Association, 2019
2019
-
[19]
Rigorous engineering for hardware security: Formal modelling and proof in the cheri design and implemen- tation process,
K. Nienhuis, A. Joannou, T. Bauereiss, A. Fox, M. Roe, B. Campbell, M. Naylor, R. M. Norton, S. W. Moore, P. G. Neumann, I. Stark, R. N. M. Watson, and P. Sewell, “Rigorous engineering for hardware security: Formal modelling and proof in the cheri design and implemen- tation process,” in2020 IEEE Symposium on Security and Privacy (SP), pp. 1003–1020, 2020
2020
-
[20]
Invited: Formal verification of security critical hardware-firmware interactions in commercial socs,
S. Ray, N. Ghosh, R. J. Masti, A. Kanuparthi, and J. M. Fung, “Invited: Formal verification of security critical hardware-firmware interactions in commercial socs,” in2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–4, 2019
2019
-
[21]
Rtl-contest: Concolic testing on rtl for detecting security vulnerabilities,
X. Meng, S. Kundu, A. K. Kanuparthi, and K. Basu, “Rtl-contest: Concolic testing on rtl for detecting security vulnerabilities,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 3, pp. 466–477, 2022
2022
-
[22]
Self-hwdebug: Automation of llm self- instructing for hardware security verification,
M. Akyash and H. M. Kamali, “Self-hwdebug: Automation of llm self- instructing for hardware security verification,” 2024
2024
-
[23]
Veriloglavd: Llm-aided rule generation for vulnerability detection in verilog,
X. Long, Y . Xia, X. Chen, and L. Kuang, “Veriloglavd: Llm-aided rule generation for vulnerability detection in verilog,” 2025
2025
-
[24]
All artificial, less intelligence: Genai through the lens of formal verification,
D. N. Gadde, A. Kumar, T. Nalapat, E. Rezunov, and F. Cappellini, “All artificial, less intelligence: Genai through the lens of formal verification,” 2024
2024
-
[25]
Bugwhisperer: Fine- tuning llms for soc hardware vulnerability detection,
S. Tarek, D. Saha, S. K. Saha, and F. Farahmandi, “Bugwhisperer: Fine- tuning llms for soc hardware vulnerability detection,” in2025 IEEE 43rd VLSI Test Symposium (VTS), pp. 1–5, 2025
2025
-
[26]
Lashed: Llms and static hardware analysis for early detection of rtl bugs,
B. Ahmad, H. Pearce, R. Karri, and B. Tan, “Lashed: Llms and static hardware analysis for early detection of rtl bugs,” 2025
2025
-
[27]
OpenRouter Platform
OpenRouter, “OpenRouter Platform.” https://openrouter.ai/, 2026. On- line; accessed March 20, 2026
2026
-
[28]
cl-verilog-1.0: Verilog fine-tuned language model
ajn313, “cl-verilog-1.0: Verilog fine-tuned language model.” https:// huggingface.co/ajn313/cl-verilog-1.0, 2026. Accessed: 2026-03-20
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.