Recognition: unknown
SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation
Pith reviewed 2026-05-07 10:10 UTC · model grok-4.3
The pith
SafeTune filters poisoned training data for LLMs generating RTL code by combining graph-based structural checks with semantic classification.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SafeTune hardens LLM fine-tuning for RTL code generation against hardware Trojan insertion by integrating a GNN that identifies anomalous circuitry patterns through structural properties and a semantic verification module using text embeddings and XGBoost to assess prompt security, effectively filtering poisoned inputs without sacrificing useful training data or requiring model modifications.
What carries the argument
The coupling of a Graph Neural Network for structural anomaly detection in circuitry with a semantic verification module based on text embeddings and XGBoost classification to filter poisoned inputs during fine-tuning.
If this is right
- LLM models fine-tuned using SafeTune produce RTL code with fewer hardware Trojans that bypass standard functionality checks.
- The method maintains the quality and quantity of legitimate training examples by minimizing false positives in poisoning detection.
- SafeTune can be applied to existing LLMs without any architectural changes or retraining from scratch.
- Enhanced robustness leads to more reliable hardware design generation in security-sensitive applications.
Where Pith is reading between the lines
- Similar dual structural-semantic filtering could be adapted to mitigate poisoning in other specialized code generation tasks like software or firmware.
- The approach suggests that combining graph-based hardware knowledge with language model embeddings provides a general strategy for securing AI-generated designs.
Load-bearing premise
Poisoned inputs can be accurately identified and removed using GNN structural anomaly detection combined with semantic embeddings and XGBoost without incorrectly discarding many valid training examples or missing many attacks.
What would settle it
Applying SafeTune to a controlled dataset of clean RTL prompts mixed with known poisoned examples containing hardware Trojans, then verifying whether the fine-tuned LLM generates significantly fewer Trojan modules than one trained on unfiltered data while retaining most clean examples.
Figures
read the original abstract
As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess prompt security. By coupling structural and semantic knowledge, SafeTune effectively filters poisoned inputs without sacrificing legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning without requiring modifications to the underlying model architecture.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SafeTune, a framework to protect LLM fine-tuning for RTL code generation from data poisoning attacks involving hardware Trojans. It uses a GNN to detect structural anomalies in circuitry and combines it with semantic embeddings fed to an XGBoost classifier to filter out poisoned prompts. The authors assert that this dual approach filters poisoned inputs effectively without sacrificing legitimate data and improves model robustness, all without changing the LLM architecture, backed by experimental results.
Significance. If the filtering mechanism proves effective with low false positives and the experiments confirm improved security in generated RTL code, this could be a meaningful contribution to securing AI-driven hardware design flows against supply-chain style attacks via poisoned training data. The combination of graph-based structural analysis and semantic classification is an interesting direction for this emerging problem.
major comments (2)
- [Abstract] Abstract: The statement 'Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning' is not accompanied by any quantitative evidence, such as specific metrics (e.g., Trojan detection rate, false positive rate on clean data), datasets used, baselines compared, or error bars. This absence undermines the central claim that the method works without sacrificing legitimate data.
- [Proposed Framework] Proposed Framework: The description of the GNN for modeling structural properties and the semantic verification module lacks details on how the XGBoost classifier is trained (e.g., features, labeled poisoned/clean examples), decision thresholds, or validation procedures to ensure high precision without discarding useful RTL examples. This is load-bearing for the 'no sacrifice' guarantee highlighted in the abstract.
minor comments (1)
- [Abstract] The abstract could benefit from a brief mention of the specific LLM or RTL dataset used in experiments to provide context.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate the suggested clarifications and quantitative details.
read point-by-point responses
-
Referee: [Abstract] Abstract: The statement 'Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning' is not accompanied by any quantitative evidence, such as specific metrics (e.g., Trojan detection rate, false positive rate on clean data), datasets used, baselines compared, or error bars. This absence undermines the central claim that the method works without sacrificing legitimate data.
Authors: We agree that the abstract would be strengthened by including specific quantitative evidence. In the revised manuscript, we will update the abstract to report key metrics from our experiments, including Trojan detection rate, false positive rate on clean data, the datasets and baselines used, and error bars from multiple runs. This will directly support the claim of effective filtering without sacrificing legitimate data. revision: yes
-
Referee: [Proposed Framework] Proposed Framework: The description of the GNN for modeling structural properties and the semantic verification module lacks details on how the XGBoost classifier is trained (e.g., features, labeled poisoned/clean examples), decision thresholds, or validation procedures to ensure high precision without discarding useful RTL examples. This is load-bearing for the 'no sacrifice' guarantee highlighted in the abstract.
Authors: We concur that more implementation details are needed for the semantic verification module to substantiate the no-sacrifice guarantee. In the revision, we will expand the Proposed Framework section with specifics on: the feature set extracted for the XGBoost classifier, how labeled poisoned and clean RTL examples were generated and used for training, the decision threshold selection process, and the validation procedures (including precision-recall metrics and cross-validation) that confirm high precision while retaining useful examples. These additions will make the dual GNN-semantic approach fully reproducible and transparent. revision: yes
Circularity Check
No significant circularity; empirical framework with no derivations or self-referential predictions
full rationale
The paper presents SafeTune as an empirical filtering framework combining GNN-based structural anomaly detection with semantic embeddings and XGBoost classification to remove poisoned RTL examples. No equations, parameter fittings, or derivation chains are described in the abstract or referenced text. The central claim of effective filtering 'without sacrificing legitimate data' is an empirical assertion supported by (unshown) experiments rather than a mathematical reduction to inputs. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing steps. This is a standard non-circular empirical proposal whose validity rests on external validation of the classifier's precision/recall, not internal definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Poisoned training data for RTL generation exhibits detectable structural anomalies in circuit graphs and semantic inconsistencies in prompts that can be identified by GNN and XGBoost without excessive false positives.
Reference graph
Works this paper leans on
-
[1]
Verigen: A large language model for verilog code generation,
S. Thakuret al., “Verigen: A large language model for verilog code generation,”ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, May 2024
2024
-
[2]
Rtl++: Graph-enhanced llm for rtl code generation,
M. Akyash, K. Azar, and H. Kamali, “Rtl++: Graph-enhanced llm for rtl code generation,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 44–50
2025
-
[3]
Meltrtl: Multi- expert llms with inference-time intervention for rtl code generation,
N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Meltrtl: Multi- expert llms with inference-time intervention for rtl code generation,” arXiv preprint arXiv:2601.13015, 2026
-
[4]
Decortl: A run-time decoding framework for rtl code generation with llms,
M. Akyash, K. Azar, and H. Kamali, “Decortl: A run-time decoding framework for rtl code generation with llms,” in2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9
2025
-
[5]
Laag-rv: Llm assisted assertion generation for rtl design verification,
K. Maddala, B. Mali, and C. Karfa, “Laag-rv: Llm assisted assertion generation for rtl design verification,” in2024 IEEE 8th International Test Conference India (ITC India). IEEE, 2024, pp. 1–6
2024
-
[6]
(security) assertions by large language models,
R. Kande, H. Pearce, B. Tan, B. Dolan-Gavitt, S. Thakur, R. Karri, and J. Rajendran, “(security) assertions by large language models,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4374– 4389, 2024
2024
-
[7]
Llm-ift: Llm- powered information flow tracking for secure hardware,
N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Llm-ift: Llm- powered information flow tracking for secure hardware,” in2025 IEEE 43rd VLSI Test Symposium (VTS). IEEE, 2025, pp. 1–5
2025
-
[8]
Chateda: A large language model powered autonomous agent for eda,
H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “Chateda: A large language model powered autonomous agent for eda,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 10, pp. 3184–3197, 2024
2024
-
[9]
A survey of research in large language models for electronic design automation,
J. Panet al., “A survey of research in large language models for electronic design automation,”ACM Transactions on Design Automation of Electronic Systems, 2025, to appear
2025
-
[10]
Evolutionary large language models for hardware security: A comparative survey,
M. Akyash and H. M. Kamali, “Evolutionary large language models for hardware security: A comparative survey,” inProceedings of the Great Lakes Symposium on VLSI (GLSVLSI), 2024, pp. 496–501
2024
-
[11]
Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,
S. Liuet al., “Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,” inIEEE Int’l Workshop on LLM-Aided Design (LAD), 2024, pp. 1–5
2024
-
[12]
Vericoder: Enhancing llm-based rtl code generation through functional correctness validation,
A. Wei, H. Tan, T. Suresh, D. Mendoza, T. S. F. X. Teixeira, K. Wang, C. Trippel, and A. Aiken, “Vericoder: Enhancing llm-based rtl code generation through functional correctness validation,”arXiv preprint arXiv:2504.15659, 2025
-
[13]
Rtl-breaker: Assessing the security of llms against backdoor attacks on hdl code generation,
L. L. Mankaliet al., “Rtl-breaker: Assessing the security of llms against backdoor attacks on hdl code generation,” inProceedings of the Design, Automation and Test in Europe Conference (DATE), 2025, pp. 1–7
2025
-
[14]
Benchmarking large lan- guage models for automated verilog rtl code generation,
S. Thakur, A. Singh, V . W. Lee, and S. Garg, “Benchmarking large lan- guage models for automated verilog rtl code generation,” inProceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE), Apr. 2023, pp. 1–6
2023
-
[15]
Rtllm: An open-source benchmark for design rtl gener- ation with large language models,
Y . Luet al., “Rtllm: An open-source benchmark for design rtl gener- ation with large language models,” inAsia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024, pp. 1–7
2024
-
[16]
N. Pinckneyet al., “Comprehensive verilog design problems (cvdp): A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification,” arXiv preprint arXiv:2506.14074, 2025
-
[17]
Salad: Systematic assessment of machine unlearning on llm-aided hardware design,
Z. Wanget al., “Salad: Systematic assessment of machine unlearning on llm-aided hardware design,”arXiv preprint arXiv:2506.02089, 2025
-
[18]
Circuitguard: Mitigating llm memorization in rtl code generation against ip leakage,
N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Circuitguard: Mitigating llm memorization in rtl code generation against ip leakage,” in2025 IEEE 43rd International Conference on Computer Design (ICCD). IEEE, 2025, pp. 790–797
2025
-
[19]
Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation,
S. Liuet al., “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation,” inProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2024
2024
-
[20]
Survey of recent developments for hardware trojan detection,
A. Jain, Z. Zhou, and U. Guin, “Survey of recent developments for hardware trojan detection,” inProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5
2021
-
[21]
Badcodeprompt: Backdoor attacks against prompt engineering of large language models for code generation,
Y . Qu, X. Han, H. Wu, and S. Cheng, “Badcodeprompt: Backdoor attacks against prompt engineering of large language models for code generation,”Automated Software Engineering, vol. 32, no. 1, p. 17, 2025
2025
-
[22]
Poisonbench: Assessing large language model vulnerability to data poisoning,
T. Fuet al., “Poisonbench: Assessing large language model vulnerability to data poisoning,” arXiv preprint arXiv:2410.08811, 2024
-
[23]
Verilog-to-pyg: A framework for graph learning and augmentation on rtl designs,
Y . Liet al., “Verilog-to-pyg: A framework for graph learning and augmentation on rtl designs,” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023, pp. 1–8
2023
-
[24]
Trojansaint: Gate-level netlist sampling-based inductive learning for hardware trojan detection,
H. Lashen, L. Alrahis, J. Knechtel, and O. Sinanoglu, “Trojansaint: Gate-level netlist sampling-based inductive learning for hardware trojan detection,” inProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5
2023
-
[25]
Hardware trojan detection using graph neural net- works,
R. Yasaeiet al., “Hardware trojan detection using graph neural net- works,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 1, pp. 25–38, 2022
2022
-
[26]
Embedding-based classifiers can detect prompt injection attacks,
M. A. Ayub and S. Majumdar, “Embedding-based classifiers can detect prompt injection attacks,” arXiv preprint arXiv:2410.22284, 2024
-
[27]
RTL++: Graph-enhanced LLM for RTL code generation,
M. Akyash, K. Azar, and H. M. Kamali, “Rtl++: Graph-enhanced llm for rtl code generation,” arXiv preprint arXiv:2505.13479, 2025
-
[28]
On design vulnerability analysis and trust benchmark development,
H. Salmani, M. Tehranipoor, and R. Karri, “On design vulnerability analysis and trust benchmark development,” inProceedings of the IEEE International Conference on Computer Design (ICCD), 2013, pp. 1–8
2013
-
[29]
Unleashing ghost: An llm-powered framework for automated hardware trojan design,
M. O. Faruque, P. Jamieson, A. Patooghy, and A.-H. A. Badawy, “Unleashing ghost: An llm-powered framework for automated hardware trojan design,” arXiv preprint arXiv:2412.02816, 2024
-
[30]
Towards General Text Embeddings with Multi-stage Contrastive Learning
Z. Li, X. Zhang, Y . Zhang, D. Long, P. Xie, and M. Zhang, “Towards general text embeddings with multi-stage contrastive learning,”arXiv preprint arXiv:2308.03281, 2023
work page internal anchor Pith review arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.