arxiv: 2604.27238 · v1 · submitted 2026-04-29 · 💻 cs.CR · cs.AR

Recognition: unknown

SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation

Mahshid Rezakhani , Nowfel Mashnoor , Kimia Azar , Hadi Kamali

Authors on Pith no claims yet

Pith reviewed 2026-05-07 10:10 UTC · model grok-4.3

classification 💻 cs.CR cs.AR

keywords data poisoning mitigationLLM fine-tuningRTL code generationhardware Trojansgraph neural networkssemantic embeddingsXGBoost classifier

0 comments

The pith

SafeTune filters poisoned training data for LLMs generating RTL code by combining graph-based structural checks with semantic classification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SafeTune to protect the fine-tuning of large language models for register-transfer level code generation from data poisoning attacks that insert hardware Trojans. It combines a graph neural network to model and detect anomalous circuit structures with a semantic module that uses text embeddings and an XGBoost classifier to evaluate prompt security. By filtering out suspicious inputs while preserving legitimate data, the framework aims to produce more secure hardware modules without any changes to the underlying LLM architecture. This matters because rapidly assembled datasets for hardware tasks often lack verification and can lead to models that generate functionally correct but insecure designs.

Core claim

SafeTune hardens LLM fine-tuning for RTL code generation against hardware Trojan insertion by integrating a GNN that identifies anomalous circuitry patterns through structural properties and a semantic verification module using text embeddings and XGBoost to assess prompt security, effectively filtering poisoned inputs without sacrificing useful training data or requiring model modifications.

What carries the argument

The coupling of a Graph Neural Network for structural anomaly detection in circuitry with a semantic verification module based on text embeddings and XGBoost classification to filter poisoned inputs during fine-tuning.

If this is right

LLM models fine-tuned using SafeTune produce RTL code with fewer hardware Trojans that bypass standard functionality checks.
The method maintains the quality and quantity of legitimate training examples by minimizing false positives in poisoning detection.
SafeTune can be applied to existing LLMs without any architectural changes or retraining from scratch.
Enhanced robustness leads to more reliable hardware design generation in security-sensitive applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar dual structural-semantic filtering could be adapted to mitigate poisoning in other specialized code generation tasks like software or firmware.
The approach suggests that combining graph-based hardware knowledge with language model embeddings provides a general strategy for securing AI-generated designs.

Load-bearing premise

Poisoned inputs can be accurately identified and removed using GNN structural anomaly detection combined with semantic embeddings and XGBoost without incorrectly discarding many valid training examples or missing many attacks.

What would settle it

Applying SafeTune to a controlled dataset of clean RTL prompts mixed with known poisoned examples containing hardware Trojans, then verifying whether the fine-tuned LLM generates significantly fewer Trojan modules than one trained on unfiltered data while retaining most clean examples.

Figures

Figures reproduced from arXiv: 2604.27238 by Hadi Kamali, Kimia Azar, Mahshid Rezakhani, Nowfel Mashnoor.

**Figure 1.** Figure 1: SafeTune overview: cross-modal analysis and filtering of prompt-RTL pairs for Trojan-resilient LLM fine-tuning view at source ↗

**Figure 2.** Figure 2: Agent-based Sanitizing forwarded to Fine-Tuned LLM in SafeTune. paraphrased with ChatGPT 5.1 to remove or alter potentially trigger-like terms prior to inference. B. Architectural Configurations The RTL structural detector utilizes a two-layer Graph Isomorphism Network (GIN) with 128 hidden units and 0.4 dropout, trained via Adam (lr = 10−3 ). The prompt-risk regressor employs an XGBoost model with 100 tre… view at source ↗

read the original abstract

As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess prompt security. By coupling structural and semantic knowledge, SafeTune effectively filters poisoned inputs without sacrificing legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning without requiring modifications to the underlying model architecture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SafeTune pairs a GNN for spotting structural oddities in RTL with an XGBoost semantic check to filter poisoned training data, but the abstract supplies no numbers, datasets, or results to show the filter actually works.

read the letter

SafeTune proposes a two-module filter for cleaning training data before fine-tuning LLMs on RTL code generation. A GNN models circuit graphs to flag anomalous structures that might indicate hardware Trojans, while a semantic module uses embeddings and XGBoost to classify prompt safety. The claim is that this combination removes poisoned examples without discarding useful data and improves model robustness without touching the LLM architecture itself.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes SafeTune, a framework to protect LLM fine-tuning for RTL code generation from data poisoning attacks involving hardware Trojans. It uses a GNN to detect structural anomalies in circuitry and combines it with semantic embeddings fed to an XGBoost classifier to filter out poisoned prompts. The authors assert that this dual approach filters poisoned inputs effectively without sacrificing legitimate data and improves model robustness, all without changing the LLM architecture, backed by experimental results.

Significance. If the filtering mechanism proves effective with low false positives and the experiments confirm improved security in generated RTL code, this could be a meaningful contribution to securing AI-driven hardware design flows against supply-chain style attacks via poisoned training data. The combination of graph-based structural analysis and semantic classification is an interesting direction for this emerging problem.

major comments (2)

[Abstract] Abstract: The statement 'Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning' is not accompanied by any quantitative evidence, such as specific metrics (e.g., Trojan detection rate, false positive rate on clean data), datasets used, baselines compared, or error bars. This absence undermines the central claim that the method works without sacrificing legitimate data.
[Proposed Framework] Proposed Framework: The description of the GNN for modeling structural properties and the semantic verification module lacks details on how the XGBoost classifier is trained (e.g., features, labeled poisoned/clean examples), decision thresholds, or validation procedures to ensure high precision without discarding useful RTL examples. This is load-bearing for the 'no sacrifice' guarantee highlighted in the abstract.

minor comments (1)

[Abstract] The abstract could benefit from a brief mention of the specific LLM or RTL dataset used in experiments to provide context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate the suggested clarifications and quantitative details.

read point-by-point responses

Referee: [Abstract] Abstract: The statement 'Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning' is not accompanied by any quantitative evidence, such as specific metrics (e.g., Trojan detection rate, false positive rate on clean data), datasets used, baselines compared, or error bars. This absence undermines the central claim that the method works without sacrificing legitimate data.

Authors: We agree that the abstract would be strengthened by including specific quantitative evidence. In the revised manuscript, we will update the abstract to report key metrics from our experiments, including Trojan detection rate, false positive rate on clean data, the datasets and baselines used, and error bars from multiple runs. This will directly support the claim of effective filtering without sacrificing legitimate data. revision: yes
Referee: [Proposed Framework] Proposed Framework: The description of the GNN for modeling structural properties and the semantic verification module lacks details on how the XGBoost classifier is trained (e.g., features, labeled poisoned/clean examples), decision thresholds, or validation procedures to ensure high precision without discarding useful RTL examples. This is load-bearing for the 'no sacrifice' guarantee highlighted in the abstract.

Authors: We concur that more implementation details are needed for the semantic verification module to substantiate the no-sacrifice guarantee. In the revision, we will expand the Proposed Framework section with specifics on: the feature set extracted for the XGBoost classifier, how labeled poisoned and clean RTL examples were generated and used for training, the decision threshold selection process, and the validation procedures (including precision-recall metrics and cross-validation) that confirm high precision while retaining useful examples. These additions will make the dual GNN-semantic approach fully reproducible and transparent. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with no derivations or self-referential predictions

full rationale

The paper presents SafeTune as an empirical filtering framework combining GNN-based structural anomaly detection with semantic embeddings and XGBoost classification to remove poisoned RTL examples. No equations, parameter fittings, or derivation chains are described in the abstract or referenced text. The central claim of effective filtering 'without sacrificing legitimate data' is an empirical assertion supported by (unshown) experiments rather than a mathematical reduction to inputs. No self-citations, ansatzes, or uniqueness theorems are invoked as load-bearing steps. This is a standard non-circular empirical proposal whose validity rests on external validation of the classifier's precision/recall, not internal definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on one key domain assumption about detectability of poisoning; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Poisoned training data for RTL generation exhibits detectable structural anomalies in circuit graphs and semantic inconsistencies in prompts that can be identified by GNN and XGBoost without excessive false positives.
This assumption directly enables the filtering mechanism described as the core of SafeTune.

pith-pipeline@v0.9.0 · 5487 in / 1376 out tokens · 85717 ms · 2026-05-07T10:10:51.217946+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Verigen: A large language model for verilog code generation,

S. Thakuret al., “Verigen: A large language model for verilog code generation,”ACM Transactions on Design Automation of Electronic Systems, vol. 29, no. 3, May 2024

2024
[2]

Rtl++: Graph-enhanced llm for rtl code generation,

M. Akyash, K. Azar, and H. Kamali, “Rtl++: Graph-enhanced llm for rtl code generation,” in2025 IEEE International Conference on LLM-Aided Design (ICLAD). IEEE, 2025, pp. 44–50

2025
[3]

Meltrtl: Multi- expert llms with inference-time intervention for rtl code generation,

N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Meltrtl: Multi- expert llms with inference-time intervention for rtl code generation,” arXiv preprint arXiv:2601.13015, 2026

work page arXiv 2026
[4]

Decortl: A run-time decoding framework for rtl code generation with llms,

M. Akyash, K. Azar, and H. Kamali, “Decortl: A run-time decoding framework for rtl code generation with llms,” in2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2025, pp. 1–9

2025
[5]

Laag-rv: Llm assisted assertion generation for rtl design verification,

K. Maddala, B. Mali, and C. Karfa, “Laag-rv: Llm assisted assertion generation for rtl design verification,” in2024 IEEE 8th International Test Conference India (ITC India). IEEE, 2024, pp. 1–6

2024
[6]

(security) assertions by large language models,

R. Kande, H. Pearce, B. Tan, B. Dolan-Gavitt, S. Thakur, R. Karri, and J. Rajendran, “(security) assertions by large language models,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 4374– 4389, 2024

2024
[7]

Llm-ift: Llm- powered information flow tracking for secure hardware,

N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Llm-ift: Llm- powered information flow tracking for secure hardware,” in2025 IEEE 43rd VLSI Test Symposium (VTS). IEEE, 2025, pp. 1–5

2025
[8]

Chateda: A large language model powered autonomous agent for eda,

H. Wu, Z. He, X. Zhang, X. Yao, S. Zheng, H. Zheng, and B. Yu, “Chateda: A large language model powered autonomous agent for eda,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 10, pp. 3184–3197, 2024

2024
[9]

A survey of research in large language models for electronic design automation,

J. Panet al., “A survey of research in large language models for electronic design automation,”ACM Transactions on Design Automation of Electronic Systems, 2025, to appear

2025
[10]

Evolutionary large language models for hardware security: A comparative survey,

M. Akyash and H. M. Kamali, “Evolutionary large language models for hardware security: A comparative survey,” inProceedings of the Great Lakes Symposium on VLSI (GLSVLSI), 2024, pp. 496–501

2024
[11]

Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,

S. Liuet al., “Rtlcoder: Outperforming gpt-3.5 in design rtl generation with our open-source dataset and lightweight solution,” inIEEE Int’l Workshop on LLM-Aided Design (LAD), 2024, pp. 1–5

2024
[12]

Vericoder: Enhancing llm-based rtl code generation through functional correctness validation,

A. Wei, H. Tan, T. Suresh, D. Mendoza, T. S. F. X. Teixeira, K. Wang, C. Trippel, and A. Aiken, “Vericoder: Enhancing llm-based rtl code generation through functional correctness validation,”arXiv preprint arXiv:2504.15659, 2025

work page arXiv 2025
[13]

Rtl-breaker: Assessing the security of llms against backdoor attacks on hdl code generation,

L. L. Mankaliet al., “Rtl-breaker: Assessing the security of llms against backdoor attacks on hdl code generation,” inProceedings of the Design, Automation and Test in Europe Conference (DATE), 2025, pp. 1–7

2025
[14]

Benchmarking large lan- guage models for automated verilog rtl code generation,

S. Thakur, A. Singh, V . W. Lee, and S. Garg, “Benchmarking large lan- guage models for automated verilog rtl code generation,” inProceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE), Apr. 2023, pp. 1–6

2023
[15]

Rtllm: An open-source benchmark for design rtl gener- ation with large language models,

Y . Luet al., “Rtllm: An open-source benchmark for design rtl gener- ation with large language models,” inAsia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024, pp. 1–7

2024
[16]

Comprehensive verilog design problems: A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification,

N. Pinckneyet al., “Comprehensive verilog design problems (cvdp): A next-generation benchmark dataset for evaluating large language models and agents on rtl design and verification,” arXiv preprint arXiv:2506.14074, 2025

work page arXiv 2025
[17]

Salad: Systematic assessment of machine unlearning on llm-aided hardware design,

Z. Wanget al., “Salad: Systematic assessment of machine unlearning on llm-aided hardware design,”arXiv preprint arXiv:2506.02089, 2025

work page arXiv 2025
[18]

Circuitguard: Mitigating llm memorization in rtl code generation against ip leakage,

N. Mashnoor, M. Akyash, H. Kamali, and K. Azar, “Circuitguard: Mitigating llm memorization in rtl code generation against ip leakage,” in2025 IEEE 43rd International Conference on Computer Design (ICCD). IEEE, 2025, pp. 790–797

2025
[19]

Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation,

S. Liuet al., “Openllm-rtl: Open dataset and benchmark for llm-aided design rtl generation,” inProceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2024

2024
[20]

Survey of recent developments for hardware trojan detection,

A. Jain, Z. Zhou, and U. Guin, “Survey of recent developments for hardware trojan detection,” inProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5

2021
[21]

Badcodeprompt: Backdoor attacks against prompt engineering of large language models for code generation,

Y . Qu, X. Han, H. Wu, and S. Cheng, “Badcodeprompt: Backdoor attacks against prompt engineering of large language models for code generation,”Automated Software Engineering, vol. 32, no. 1, p. 17, 2025

2025
[22]

Poisonbench: Assessing large language model vulnerability to data poisoning,

T. Fuet al., “Poisonbench: Assessing large language model vulnerability to data poisoning,” arXiv preprint arXiv:2410.08811, 2024

work page arXiv 2024
[23]

Verilog-to-pyg: A framework for graph learning and augmentation on rtl designs,

Y . Liet al., “Verilog-to-pyg: A framework for graph learning and augmentation on rtl designs,” inIEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2023, pp. 1–8

2023
[24]

Trojansaint: Gate-level netlist sampling-based inductive learning for hardware trojan detection,

H. Lashen, L. Alrahis, J. Knechtel, and O. Sinanoglu, “Trojansaint: Gate-level netlist sampling-based inductive learning for hardware trojan detection,” inProceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5

2023
[25]

Hardware trojan detection using graph neural net- works,

R. Yasaeiet al., “Hardware trojan detection using graph neural net- works,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 1, pp. 25–38, 2022

2022
[26]

Embedding-based classifiers can detect prompt injection attacks,

M. A. Ayub and S. Majumdar, “Embedding-based classifiers can detect prompt injection attacks,” arXiv preprint arXiv:2410.22284, 2024

work page arXiv 2024
[27]

RTL++: Graph-enhanced LLM for RTL code generation,

M. Akyash, K. Azar, and H. M. Kamali, “Rtl++: Graph-enhanced llm for rtl code generation,” arXiv preprint arXiv:2505.13479, 2025

work page arXiv 2025
[28]

On design vulnerability analysis and trust benchmark development,

H. Salmani, M. Tehranipoor, and R. Karri, “On design vulnerability analysis and trust benchmark development,” inProceedings of the IEEE International Conference on Computer Design (ICCD), 2013, pp. 1–8

2013
[29]

Unleashing ghost: An llm-powered framework for automated hardware trojan design,

M. O. Faruque, P. Jamieson, A. Patooghy, and A.-H. A. Badawy, “Unleashing ghost: An llm-powered framework for automated hardware trojan design,” arXiv preprint arXiv:2412.02816, 2024

work page arXiv 2024
[30]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Z. Li, X. Zhang, Y . Zhang, D. Long, P. Xie, and M. Zhang, “Towards general text embeddings with multi-stage contrastive learning,”arXiv preprint arXiv:2308.03281, 2023

work page internal anchor Pith review arXiv 2023