Decomposing Memorization Reduction in Privacy-Preserving Fine-Tuning of SLMs for CSIRTs
Pith reviewed 2026-06-30 01:21 UTC · model grok-4.3
The pith
Reducing optimizer update count alone explains the full drop in memorization during DP SGD fine-tuning of 1B-3B models on CSIRT data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Matched update controls reproduce the observed reduction in memorization by reducing the number of optimizer updates alone, accounting for 66 percent to 132 percent of the measured effect, with a mean of 100 percent across three seeds and four models. In this setting, DP SGD provides the formal privacy guarantee but does not produce additional measurable reductions in memorization. HMAC pseudonymization removes the original identifiers from the exposure surface, reducing exposure by 40 percent to 61 percent, while pseudonymized identifiers remain close to the expected random baseline.
What carries the argument
Matched update controls that hold training steps fixed while varying only the DP noise mechanism.
If this is right
- DP SGD's formal privacy guarantee does not deliver extra memorization reduction once update count is matched.
- HMAC pseudonymization lowers identifier exposure without creating secondary memorization targets.
- Under the evaluated budget, 1B-3B SLMs on CSIRT data stay below operationally useful F1 performance.
- Training regimes that cut optimizer steps can achieve the same memorization drop as DP without the noise overhead.
Where Pith is reading between the lines
- Practitioners may achieve similar memorization control by simply shortening training rather than adding DP noise.
- The gap between formal DP guarantees and empirical memorization protection could affect how regulators weigh DP SGD for structured incident data.
- Future work could test whether larger models or different data structures change the dominance of update count over DP noise.
Load-bearing premise
The twenty planted canaries plus the four extraction attacks and dual HMAC attack form a sufficient probe of memorization risk for the tested CSIRT data and 1B-3B models.
What would settle it
Repeating the experiments with substantially more canaries or stronger extraction attacks that recover additional memorized sequences beyond what the update-count controls predict.
read the original abstract
CSIRTs increasingly fine tune language models on vulnerability scan records, but these records expose internal network topology and create privacy risks under regulations such as GDPR and LGPD. We present the first empirical study of how DP SGD and HMAC pseudonymization interact when fine tuning small language models with 1B to 3B parameters on structured CSIRT data. We evaluate 96 LoRA adapters across four SLMs and four training regimes, including raw fine tuning, QLoRA with large batch training, and DP SGD with epsilon equal to 2 and 8. We also audit memorization using 20 planted canaries, four extraction attacks, and a dual attack targeting HMAC pseudonymized identifiers. Our results show three main findings. First, matched update controls reproduce the observed reduction in memorization by reducing the number of optimizer updates alone, accounting for 66 percent to 132 percent of the measured effect, with a mean of 100 percent across three seeds and four models. In this setting, DP SGD provides the formal privacy guarantee but does not produce additional measurable reductions in memorization. Second, HMAC pseudonymization removes the original identifiers from the exposure surface, reducing exposure by 40 percent to 61 percent, while pseudonymized identifiers remain close to the expected random baseline and do not become a secondary memorization target. Third, F1 scores remain between 0.19 and 0.28 across all 96 adapters using four shot prompting, indicating that, under the evaluated training budget, 1B to 3B SLMs do not achieve operationally useful performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports the first empirical study of DP-SGD and HMAC pseudonymization for fine-tuning 1B-3B SLMs on structured CSIRT vulnerability records. Across 96 LoRA adapters, four models, and four regimes (raw, QLoRA, DP-SGD at ε=2/8), it audits memorization via 20 planted canaries plus four extraction attacks and a dual HMAC attack. Central claims: matched-update non-DP controls reproduce 66–132 % (mean 100 %) of the observed memorization reduction, so DP-SGD supplies the formal guarantee but no additional measurable reduction; HMAC cuts exposure 40–61 % without creating secondary targets; and four-shot F1 remains 0.19–0.28, below operational utility.
Significance. If the decomposition holds, the work indicates that, for memorization risk on this data type, the empirical effect of DP-SGD is largely replicable by simply limiting optimizer steps, while the formal (ε,δ) guarantee remains the distinctive contribution. The controlled multi-seed, multi-model design and explicit attack suite strengthen the result. The finding that HMAC pseudonymization is effective without leakage into the model is practically useful for regulated CSIRT settings. Low task performance underscores the need for larger data or models but does not undermine the privacy analysis.
major comments (2)
- [Abstract / Results (first finding)] Abstract and results on the first finding: the claim that matched-update controls account for a mean of 100 % (range 66–132 %) of the memorization reduction rests on the assumption that the 20-canary, four-attack probe would have detected any additional DP-specific effect if one existed. In structured CSIRT records the canaries occupy a narrow slice of the identifier space; a diffuse DP effect on non-canary tokens could therefore produce a null result even if a real difference is present. The range crossing 100 % already signals possible mismatch between regimes on dimensions other than step count.
- [Experimental design] Experimental design paragraph: the paper does not report a power analysis or sensitivity study for the 20-canary probe. With only 20 planted examples and four extraction methods, it is unclear whether the measurement has sufficient statistical power to rule out a modest additional reduction attributable to noise or clipping once update count is controlled.
minor comments (2)
- [Abstract] The abstract states three seeds and four models but does not indicate whether the 66–132 % range reflects per-seed or per-model variation; a table or figure breaking this down would clarify reproducibility.
- [Results] No mention of statistical significance tests (e.g., paired t-tests or bootstrap intervals) on the memorization metrics; adding these would strengthen the “no additional measurable reduction” conclusion.
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback on our manuscript. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [Abstract / Results (first finding)] Abstract and results on the first finding: the claim that matched-update controls account for a mean of 100 % (range 66–132 %) of the memorization reduction rests on the assumption that the 20-canary, four-attack probe would have detected any additional DP-specific effect if one existed. In structured CSIRT records the canaries occupy a narrow slice of the identifier space; a diffuse DP effect on non-canary tokens could therefore produce a null result even if a real difference is present. The range crossing 100 % already signals possible mismatch between regimes on dimensions other than step count.
Authors: The canary construction and extraction attacks were deliberately focused on the identifier fields, which constitute the primary privacy-sensitive content in CSIRT vulnerability records under the relevant regulations. The four attacks are identifier-recovery attacks; any diffuse DP effect on non-identifier tokens would not alter the exposure metric we report. The observed range (66–132 %) is attributable to stochastic variation across models and seeds; the mean of 100 % is computed over the full set of 12 model-seed combinations and remains centered on full replication. We will add an explicit paragraph in the revised results section clarifying the scope of the probe and noting that non-identifier effects lie outside the evaluated privacy risk. revision: partial
-
Referee: [Experimental design] Experimental design paragraph: the paper does not report a power analysis or sensitivity study for the 20-canary probe. With only 20 planted examples and four extraction methods, it is unclear whether the measurement has sufficient statistical power to rule out a modest additional reduction attributable to noise or clipping once update count is controlled.
Authors: We acknowledge that a formal power analysis was omitted. The 20-canary design was selected to keep the planted set small relative to the training corpus while still covering multiple identifier formats; consistency of the matched-update result across four models and three seeds provides indirect evidence of robustness. In revision we will add a post-hoc sensitivity subsection that reports (i) the detectable effect size given the observed per-canary variance and (ii) results when the probe is restricted to random subsets of 10–15 canaries. revision: yes
Circularity Check
No circularity: purely empirical comparisons with no derivations or fitted inputs
full rationale
The paper reports direct experimental measurements across 96 LoRA adapters on four SLMs under four regimes (raw fine-tuning, QLoRA, DP-SGD at ε=2/8), using 20 planted canaries plus four extraction attacks plus dual HMAC attack to quantify memorization. The central claim—that matched-update non-DP controls account for 66–132 % (mean 100 %) of the observed memorization drop—is obtained by comparing measured attack success rates between regimes that differ only in optimizer-step count. No equations, fitted parameters, or self-citation chains are invoked to derive the target quantity; the result is a straightforward empirical ratio of observed values. The paper contains no mathematical derivation chain, no ansatz smuggled via citation, and no uniqueness theorems. The work is therefore self-contained against external benchmarks and receives score 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 20 planted canaries and four extraction attacks plus dual HMAC attack form a representative probe of memorization for structured CSIRT records.
Reference graph
Works this paper leans on
-
[1]
Deep learning with differential privacy
Martin Abadi et al. Deep learning with differential privacy. InACM CCS, 2016
2016
-
[2]
Alaa et al
Ahmed M. Alaa et al. How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models. InICML, 2022
2022
-
[3]
On-premise SLMs vs
Gefté Almeida et al. On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. InERRC, 2025
2025
-
[4]
Large-scale differentially private BERT
Rohan Anil et al. Large-scale differentially private BERT. InEMNLP, 2022
2022
-
[5]
Varun Badrinath Krishna. AttackQA: Development and adoption of a dataset for assisting cybersecurity operations using fine-tuned and open-source LLMs.arXiv:2411.01073, 2024
-
[6]
New proofs for NMAC and HMAC: Security without collision resistance
Mihir Bellare. New proofs for NMAC and HMAC: Security without collision resistance. InCRYPTO, 2006
2006
-
[7]
Controlling the false discovery rate: A practical and powerful approach to multiple testing.J
Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.J. R. Stat. Soc. B, 57(1), 1995
1995
-
[8]
Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026
Thierry Bossy et al. Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026. arXiv:2502.05087
-
[9]
Brown et al
Tom B. Brown et al. Language models are few-shot learners. InNeurIPS, 2020
2020
-
[10]
The secret sharer: Evaluating and testing unintended memorization in neural networks
Nicholas Carlini et al. The secret sharer: Evaluating and testing unintended memorization in neural networks. InUSENIX Security, 2019
2019
-
[11]
Extracting training data from large language models
Nicholas Carlini et al. Extracting training data from large language models. InUSENIX Security, 2021
2021
-
[12]
Quantifying memorization across neural language models
Nicholas Carlini et al. Quantifying memorization across neural language models. InICLR, 2023
2023
-
[13]
Stealing part of a production language model
Nicholas Carlini et al. Stealing part of a production language model. InICML, 2024
2024
-
[14]
QLoRA: Efficient finetuning of quantized LLMs
Tim Dettmers et al. QLoRA: Efficient finetuning of quantized LLMs. InNeurIPS, 2023
2023
-
[15]
Do membership inference attacks work on large language models?arXiv:2402.07841, 2024
Michael Duan et al. Do membership inference attacks work on large language models?arXiv:2402.07841, 2024
-
[16]
FIRST.Common Vulnerability Scoring System v3.1: Specification Document, 2019
2019
-
[17]
Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020
Jonas Geiping et al. Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020
2020
-
[18]
Gemma Team. Gemma 3 technical report. Technical report, Google DeepMind, 2025. arXiv:2503.19786
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
Numerical composition of differential privacy
Sivakanth Gopi, Yin Tat Lee, and Lukas Wutschitz. Numerical composition of differential privacy. InNeurIPS, 2021
2021
-
[20]
Aaron Grattafiori et al. The Llama 3 herd of models. Technical report, Meta AI, 2024. arXiv:2407.21783
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Large Language Models for Security Operations Centers: A Comprehensive Survey
Ali Habibzadeh, Farid Feyzi, and Reza Ebrahimi Atani. Large language models for security operations centers: A comprehensive survey. arXiv:2509.10858, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[22]
Measuring memorization in language models via probabilistic extraction
Jamie Hayes et al. Measuring memorization in language models via probabilistic extraction. InNAACL, 2025
2025
-
[23]
Train longer, generalize better: Closing the generalization gap in large batch training of neural networks
Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize better: Closing the generalization gap in large batch training of neural networks. InNeurIPS, 2017
2017
-
[24]
Hu et al
Edward J. Hu et al. LoRA: Low-rank adaptation of large language models. InICLR, 2022
2022
-
[25]
Preventing generation of verbatim memorization in language models gives a false sense of privacy
Daphne Ippolito et al. Preventing generation of verbatim memorization in language models gives a false sense of privacy. InINLG, 2023
2023
-
[26]
Measuring forgetting of memorized training examples
Matthew Jagielski et al. Measuring forgetting of memorized training examples. InICLR, 2023
2023
-
[27]
User inference attacks on large language models
Nikhil Kandpal et al. User inference attacks on large language models. InEMNLP, 2024
2024
-
[28]
Deduplicating training data mitigates privacy risks in language models
Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. InICML, 2022
2022
-
[29]
AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data
Cristhian Kapelinski et al. AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data. InSBRC, 2026
2026
-
[30]
et al. Krämer. Integrating large language models into security incident response. InSOUPS, 2025
2025
-
[31]
HMAC: Keyed-hashing for message authentication
Hugo Krawczyk, Mihir Bellare, and Ran Canetti. HMAC: Keyed-hashing for message authentication. IETF RFC 2104, 1997
1997
-
[32]
Efficient memory management for large language model serving with PagedAttention
Woosuk Kwon et al. Efficient memory management for large language model serving with PagedAttention. InSOSP, 2023
2023
-
[33]
Large language models can be strong differentially private learners
Xuechen Li et al. Large language models can be strong differentially private learners. InICLR, 2022
2022
-
[34]
Holistic evaluation of language models.TMLR, 2023
Percy Liang et al. Holistic evaluation of language models.TMLR, 2023
2023
-
[35]
IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025
Xihuan Lin et al. IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025
-
[36]
Analyzing leakage of personally identifiable information in language models
Nils Lukas et al. Analyzing leakage of personally identifiable information in language models. InIEEE S&P, 2023
2023
-
[37]
Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful
Martin Marek et al. Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful. InNeurIPS, 2025
2025
-
[38]
An empirical analysis of memorization in fine-tuned autoregressive language models
Fatemehsadat Mireshghallah et al. An empirical analysis of memorization in fine-tuned autoregressive language models. InEMNLP, 2022
2022
-
[39]
Adversary instantiation: Lower bounds for differentially private machine learning
Milad Nasr et al. Adversary instantiation: Lower bounds for differentially private machine learning. InIEEE S&P, 2021
2021
-
[40]
NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management
Alex Nelson et al. NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management. Technical report, NIST, 2025
2025
-
[41]
NIST updates NVD operations to address record CVE growth, 2026
NIST. NIST updates NVD operations to address record CVE growth, 2026
2026
-
[42]
On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit
Lukas Nitz et al. On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit. Threats, 6(1), 2025
2025
-
[43]
Qwen Team. Qwen3 technical report. Technical report, Alibaba Group, 2025. arXiv:2505.09388
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation
Ossi Räisä, Joonas Jälkö, and Antti Honkela. Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation. InICML, 2024
2024
-
[45]
Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024
Maria Rigaki, Carlos Catania, and Sebastian Garcia. Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024
-
[46]
Quantifying language models’ sensitivity to spurious features in prompt design
Melanie Sclar et al. Quantifying language models’ sensitivity to spurious features in prompt design. InICLR, 2024
2024
-
[47]
Severo et al
Alex S. Severo et al. LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. InSBSeg, 2025
2025
-
[48]
Detecting pretraining data from large language models
Weijia Shi et al. Detecting pretraining data from large language models. InICLR, 2024
2024
-
[49]
Membership inference attacks against machine learning models
Reza Shokri et al. Membership inference attacks against machine learning models. InIEEE S&P, 2017
2017
-
[50]
LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres
Ronal Singh et al. LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres. InarXiv preprint, 2025. arXiv:2508.18947
-
[51]
VaultGemma: A differentially private Gemma model
Amit Sinha et al. VaultGemma: A differentially private Gemma model. Technical report, Google, 2025. arXiv:2510.15001
-
[52]
General and specific utility measures for synthetic data.J
Joshua Snoke et al. General and specific utility measures for synthetic data.J. R. Stat. Soc. A, 181(3), 2018
2018
-
[53]
Privacy auditing with one (1) training run
Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. InNeurIPS, 2023
2023
-
[54]
Considerations for differentially private learning with large-scale public pretraining
Florian Tramèr, Gautam Kamath, and Nicholas Carlini. Considerations for differentially private learning with large-scale public pretraining. arXiv:2212.06470, 2022
-
[55]
Fei Wang and Baochun Li. Leaner training, lower leakage: Revisiting memorization in LLM fine-tuning with LoRA.arXiv:2506.20856, 2025
-
[56]
Generalization vs memorization: Tracing language models’ capabilities back to pretraining data
Xinyi Wang et al. Generalization vs memorization: Tracing language models’ capabilities back to pretraining data. InICLR, 2025
2025
-
[57]
Privacy risk in machine learning: Analyzing the connection to overfitting
Samuel Yeom et al. Privacy risk in machine learning: Analyzing the connection to overfitting. InIEEE CSF, 2018
2018
-
[58]
Opacus: User-friendly differential privacy library in PyTorch
Ashkan Yousefpour et al. Opacus: User-friendly differential privacy library in PyTorch. InNeurIPS Workshop on Privacy in ML, 2021
2021
-
[59]
Differentially private fine-tuning of language models
Da Yu et al. Differentially private fine-tuning of language models. InICLR, 2022
2022
-
[60]
Analyzing information leakage of updates to natural language models
Santiago Zanella-Béguelin et al. Analyzing information leakage of updates to natural language models. InACM CCS, 2020
2020
-
[61]
Min-K%++: Improved baseline for pre-training data detection
Jingyang Zhang et al. Min-K%++: Improved baseline for pre-training data detection. InICLR, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.