Decomposing Memorization Reduction in Privacy-Preserving Fine-Tuning of SLMs for CSIRTs

Cristhian Kapelinski; Diego Kreutz

arxiv: 2606.28479 · v1 · pith:O3XQZJTYnew · submitted 2026-06-26 · 💻 cs.CR · cs.AI

Decomposing Memorization Reduction in Privacy-Preserving Fine-Tuning of SLMs for CSIRTs

Cristhian Kapelinski , Diego Kreutz This is my paper

Pith reviewed 2026-06-30 01:21 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords memorizationdifferential privacyDP SGDfine-tuningsmall language modelsCSIRT dataHMAC pseudonymizationLoRA adapters

0 comments

The pith

Reducing optimizer update count alone explains the full drop in memorization during DP SGD fine-tuning of 1B-3B models on CSIRT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether differential privacy via DP SGD and HMAC pseudonymization reduce memorization when fine-tuning small language models on structured vulnerability records. It compares 96 LoRA adapters across raw training, QLoRA, and DP variants with epsilon 2 and 8, using planted canaries and extraction attacks to measure exposure. The central result is that matched controls isolating update count reproduce 66 to 132 percent of the observed memorization reduction, with a mean of 100 percent, so DP SGD supplies the formal guarantee without adding measurable protection beyond fewer steps. HMAC removes original identifiers from the exposure surface by 40 to 61 percent without turning the pseudonyms into new targets. Model utility remains low, with F1 scores between 0.19 and 0.28 under four-shot prompting.

Core claim

Matched update controls reproduce the observed reduction in memorization by reducing the number of optimizer updates alone, accounting for 66 percent to 132 percent of the measured effect, with a mean of 100 percent across three seeds and four models. In this setting, DP SGD provides the formal privacy guarantee but does not produce additional measurable reductions in memorization. HMAC pseudonymization removes the original identifiers from the exposure surface, reducing exposure by 40 percent to 61 percent, while pseudonymized identifiers remain close to the expected random baseline.

What carries the argument

Matched update controls that hold training steps fixed while varying only the DP noise mechanism.

If this is right

DP SGD's formal privacy guarantee does not deliver extra memorization reduction once update count is matched.
HMAC pseudonymization lowers identifier exposure without creating secondary memorization targets.
Under the evaluated budget, 1B-3B SLMs on CSIRT data stay below operationally useful F1 performance.
Training regimes that cut optimizer steps can achieve the same memorization drop as DP without the noise overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners may achieve similar memorization control by simply shortening training rather than adding DP noise.
The gap between formal DP guarantees and empirical memorization protection could affect how regulators weigh DP SGD for structured incident data.
Future work could test whether larger models or different data structures change the dominance of update count over DP noise.

Load-bearing premise

The twenty planted canaries plus the four extraction attacks and dual HMAC attack form a sufficient probe of memorization risk for the tested CSIRT data and 1B-3B models.

What would settle it

Repeating the experiments with substantially more canaries or stronger extraction attacks that recover additional memorized sequences beyond what the update-count controls predict.

read the original abstract

CSIRTs increasingly fine tune language models on vulnerability scan records, but these records expose internal network topology and create privacy risks under regulations such as GDPR and LGPD. We present the first empirical study of how DP SGD and HMAC pseudonymization interact when fine tuning small language models with 1B to 3B parameters on structured CSIRT data. We evaluate 96 LoRA adapters across four SLMs and four training regimes, including raw fine tuning, QLoRA with large batch training, and DP SGD with epsilon equal to 2 and 8. We also audit memorization using 20 planted canaries, four extraction attacks, and a dual attack targeting HMAC pseudonymized identifiers. Our results show three main findings. First, matched update controls reproduce the observed reduction in memorization by reducing the number of optimizer updates alone, accounting for 66 percent to 132 percent of the measured effect, with a mean of 100 percent across three seeds and four models. In this setting, DP SGD provides the formal privacy guarantee but does not produce additional measurable reductions in memorization. Second, HMAC pseudonymization removes the original identifiers from the exposure surface, reducing exposure by 40 percent to 61 percent, while pseudonymized identifiers remain close to the expected random baseline and do not become a secondary memorization target. Third, F1 scores remain between 0.19 and 0.28 across all 96 adapters using four shot prompting, indicating that, under the evaluated training budget, 1B to 3B SLMs do not achieve operationally useful performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DP-SGD's memorization drop in this CSIRT setup is explained by fewer optimizer updates, not the privacy noise itself.

read the letter

The main finding is that matched non-DP controls with the same number of updates reproduce the full memorization reduction seen under DP-SGD, averaging 100% of the effect across seeds and models. DP-SGD supplies the formal guarantee but adds nothing measurable on the canary probes.

The work runs a controlled comparison across 96 LoRA adapters on four 1B-3B models, three seeds, and regimes that include raw fine-tuning, large-batch QLoRA, and DP-SGD at epsilon 2 and 8. They also test HMAC pseudonymization with a dual attack. The decomposition is direct and the HMAC result is clean: it cuts exposure 40-61% without turning the pseudonyms into new targets.

The soft spot is the probe itself. Twenty planted canaries plus four extraction attacks may simply miss any diffuse DP effect that does not concentrate on those exact tokens. The reported 66-132% range already shows variability that could come from other differences between the DP and control runs. Performance is another limit: F1 stays 0.19-0.28 under four-shot prompting, so none of the adapters reach operationally useful levels anyway.

This is useful for CSIRT teams that must choose among privacy tools when fine-tuning on vulnerability records. It does not rewrite general privacy theory but supplies a concrete measurement for this narrow setting. The empirical design is solid enough to warrant referee time.

Referee Report

2 major / 2 minor

Summary. The paper reports the first empirical study of DP-SGD and HMAC pseudonymization for fine-tuning 1B-3B SLMs on structured CSIRT vulnerability records. Across 96 LoRA adapters, four models, and four regimes (raw, QLoRA, DP-SGD at ε=2/8), it audits memorization via 20 planted canaries plus four extraction attacks and a dual HMAC attack. Central claims: matched-update non-DP controls reproduce 66–132 % (mean 100 %) of the observed memorization reduction, so DP-SGD supplies the formal guarantee but no additional measurable reduction; HMAC cuts exposure 40–61 % without creating secondary targets; and four-shot F1 remains 0.19–0.28, below operational utility.

Significance. If the decomposition holds, the work indicates that, for memorization risk on this data type, the empirical effect of DP-SGD is largely replicable by simply limiting optimizer steps, while the formal (ε,δ) guarantee remains the distinctive contribution. The controlled multi-seed, multi-model design and explicit attack suite strengthen the result. The finding that HMAC pseudonymization is effective without leakage into the model is practically useful for regulated CSIRT settings. Low task performance underscores the need for larger data or models but does not undermine the privacy analysis.

major comments (2)

[Abstract / Results (first finding)] Abstract and results on the first finding: the claim that matched-update controls account for a mean of 100 % (range 66–132 %) of the memorization reduction rests on the assumption that the 20-canary, four-attack probe would have detected any additional DP-specific effect if one existed. In structured CSIRT records the canaries occupy a narrow slice of the identifier space; a diffuse DP effect on non-canary tokens could therefore produce a null result even if a real difference is present. The range crossing 100 % already signals possible mismatch between regimes on dimensions other than step count.
[Experimental design] Experimental design paragraph: the paper does not report a power analysis or sensitivity study for the 20-canary probe. With only 20 planted examples and four extraction methods, it is unclear whether the measurement has sufficient statistical power to rule out a modest additional reduction attributable to noise or clipping once update count is controlled.

minor comments (2)

[Abstract] The abstract states three seeds and four models but does not indicate whether the 66–132 % range reflects per-seed or per-model variation; a table or figure breaking this down would clarify reproducibility.
[Results] No mention of statistical significance tests (e.g., paired t-tests or bootstrap intervals) on the memorization metrics; adding these would strengthen the “no additional measurable reduction” conclusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and constructive feedback on our manuscript. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [Abstract / Results (first finding)] Abstract and results on the first finding: the claim that matched-update controls account for a mean of 100 % (range 66–132 %) of the memorization reduction rests on the assumption that the 20-canary, four-attack probe would have detected any additional DP-specific effect if one existed. In structured CSIRT records the canaries occupy a narrow slice of the identifier space; a diffuse DP effect on non-canary tokens could therefore produce a null result even if a real difference is present. The range crossing 100 % already signals possible mismatch between regimes on dimensions other than step count.

Authors: The canary construction and extraction attacks were deliberately focused on the identifier fields, which constitute the primary privacy-sensitive content in CSIRT vulnerability records under the relevant regulations. The four attacks are identifier-recovery attacks; any diffuse DP effect on non-identifier tokens would not alter the exposure metric we report. The observed range (66–132 %) is attributable to stochastic variation across models and seeds; the mean of 100 % is computed over the full set of 12 model-seed combinations and remains centered on full replication. We will add an explicit paragraph in the revised results section clarifying the scope of the probe and noting that non-identifier effects lie outside the evaluated privacy risk. revision: partial
Referee: [Experimental design] Experimental design paragraph: the paper does not report a power analysis or sensitivity study for the 20-canary probe. With only 20 planted examples and four extraction methods, it is unclear whether the measurement has sufficient statistical power to rule out a modest additional reduction attributable to noise or clipping once update count is controlled.

Authors: We acknowledge that a formal power analysis was omitted. The 20-canary design was selected to keep the planted set small relative to the training corpus while still covering multiple identifier formats; consistency of the matched-update result across four models and three seeds provides indirect evidence of robustness. In revision we will add a post-hoc sensitivity subsection that reports (i) the detectable effect size given the observed per-canary variance and (ii) results when the probe is restricted to random subsets of 10–15 canaries. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparisons with no derivations or fitted inputs

full rationale

The paper reports direct experimental measurements across 96 LoRA adapters on four SLMs under four regimes (raw fine-tuning, QLoRA, DP-SGD at ε=2/8), using 20 planted canaries plus four extraction attacks plus dual HMAC attack to quantify memorization. The central claim—that matched-update non-DP controls account for 66–132 % (mean 100 %) of the observed memorization drop—is obtained by comparing measured attack success rates between regimes that differ only in optimizer-step count. No equations, fitted parameters, or self-citation chains are invoked to derive the target quantity; the result is a straightforward empirical ratio of observed values. The paper contains no mathematical derivation chain, no ansatz smuggled via citation, and no uniqueness theorems. The work is therefore self-contained against external benchmarks and receives score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study relies on standard differential privacy parameters (epsilon 2 and 8) and LoRA adapters without introducing new fitted constants or postulated entities; the main empirical claims rest on the assumption that the chosen canary and attack setup measures the intended risk.

axioms (1)

domain assumption The 20 planted canaries and four extraction attacks plus dual HMAC attack form a representative probe of memorization for structured CSIRT records.
Invoked in the evaluation of memorization reduction across all regimes.

pith-pipeline@v0.9.1-grok · 5827 in / 1466 out tokens · 39260 ms · 2026-06-30T01:21:40.324773+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 13 canonical work pages · 4 internal anchors

[1]

Deep learning with differential privacy

Martin Abadi et al. Deep learning with differential privacy. InACM CCS, 2016

2016
[2]

Alaa et al

Ahmed M. Alaa et al. How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models. InICML, 2022

2022
[3]

On-premise SLMs vs

Gefté Almeida et al. On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. InERRC, 2025

2025
[4]

Large-scale differentially private BERT

Rohan Anil et al. Large-scale differentially private BERT. InEMNLP, 2022

2022
[5]

AttackQA: Development and adoption of a dataset for assisting cybersecurity operations using fine-tuned and open-source LLMs.arXiv:2411.01073, 2024

Varun Badrinath Krishna. AttackQA: Development and adoption of a dataset for assisting cybersecurity operations using fine-tuned and open-source LLMs.arXiv:2411.01073, 2024

work page arXiv 2024
[6]

New proofs for NMAC and HMAC: Security without collision resistance

Mihir Bellare. New proofs for NMAC and HMAC: Security without collision resistance. InCRYPTO, 2006

2006
[7]

Controlling the false discovery rate: A practical and powerful approach to multiple testing.J

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.J. R. Stat. Soc. B, 57(1), 1995

1995
[8]

Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026

Thierry Bossy et al. Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026. arXiv:2502.05087

work page arXiv 2026
[9]

Brown et al

Tom B. Brown et al. Language models are few-shot learners. InNeurIPS, 2020

2020
[10]

The secret sharer: Evaluating and testing unintended memorization in neural networks

Nicholas Carlini et al. The secret sharer: Evaluating and testing unintended memorization in neural networks. InUSENIX Security, 2019

2019
[11]

Extracting training data from large language models

Nicholas Carlini et al. Extracting training data from large language models. InUSENIX Security, 2021

2021
[12]

Quantifying memorization across neural language models

Nicholas Carlini et al. Quantifying memorization across neural language models. InICLR, 2023

2023
[13]

Stealing part of a production language model

Nicholas Carlini et al. Stealing part of a production language model. InICML, 2024

2024
[14]

QLoRA: Efficient finetuning of quantized LLMs

Tim Dettmers et al. QLoRA: Efficient finetuning of quantized LLMs. InNeurIPS, 2023

2023
[15]

Do membership inference attacks work on large language models?arXiv:2402.07841, 2024

Michael Duan et al. Do membership inference attacks work on large language models?arXiv:2402.07841, 2024

work page arXiv 2024
[16]

FIRST.Common Vulnerability Scoring System v3.1: Specification Document, 2019

2019
[17]

Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020

Jonas Geiping et al. Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020

2020
[18]

Gemma 3 Technical Report

Gemma Team. Gemma 3 technical report. Technical report, Google DeepMind, 2025. arXiv:2503.19786

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Numerical composition of differential privacy

Sivakanth Gopi, Yin Tat Lee, and Lukas Wutschitz. Numerical composition of differential privacy. InNeurIPS, 2021

2021
[20]

The Llama 3 Herd of Models

Aaron Grattafiori et al. The Llama 3 herd of models. Technical report, Meta AI, 2024. arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Large Language Models for Security Operations Centers: A Comprehensive Survey

Ali Habibzadeh, Farid Feyzi, and Reza Ebrahimi Atani. Large language models for security operations centers: A comprehensive survey. arXiv:2509.10858, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Measuring memorization in language models via probabilistic extraction

Jamie Hayes et al. Measuring memorization in language models via probabilistic extraction. InNAACL, 2025

2025
[23]

Train longer, generalize better: Closing the generalization gap in large batch training of neural networks

Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize better: Closing the generalization gap in large batch training of neural networks. InNeurIPS, 2017

2017
[24]

Hu et al

Edward J. Hu et al. LoRA: Low-rank adaptation of large language models. InICLR, 2022

2022
[25]

Preventing generation of verbatim memorization in language models gives a false sense of privacy

Daphne Ippolito et al. Preventing generation of verbatim memorization in language models gives a false sense of privacy. InINLG, 2023

2023
[26]

Measuring forgetting of memorized training examples

Matthew Jagielski et al. Measuring forgetting of memorized training examples. InICLR, 2023

2023
[27]

User inference attacks on large language models

Nikhil Kandpal et al. User inference attacks on large language models. InEMNLP, 2024

2024
[28]

Deduplicating training data mitigates privacy risks in language models

Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. InICML, 2022

2022
[29]

AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data

Cristhian Kapelinski et al. AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data. InSBRC, 2026

2026
[30]

et al. Krämer. Integrating large language models into security incident response. InSOUPS, 2025

2025
[31]

HMAC: Keyed-hashing for message authentication

Hugo Krawczyk, Mihir Bellare, and Ran Canetti. HMAC: Keyed-hashing for message authentication. IETF RFC 2104, 1997

1997
[32]

Efficient memory management for large language model serving with PagedAttention

Woosuk Kwon et al. Efficient memory management for large language model serving with PagedAttention. InSOSP, 2023

2023
[33]

Large language models can be strong differentially private learners

Xuechen Li et al. Large language models can be strong differentially private learners. InICLR, 2022

2022
[34]

Holistic evaluation of language models.TMLR, 2023

Percy Liang et al. Holistic evaluation of language models.TMLR, 2023

2023
[35]

IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025

Xihuan Lin et al. IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025

work page arXiv 2025
[36]

Analyzing leakage of personally identifiable information in language models

Nils Lukas et al. Analyzing leakage of personally identifiable information in language models. InIEEE S&P, 2023

2023
[37]

Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful

Martin Marek et al. Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful. InNeurIPS, 2025

2025
[38]

An empirical analysis of memorization in fine-tuned autoregressive language models

Fatemehsadat Mireshghallah et al. An empirical analysis of memorization in fine-tuned autoregressive language models. InEMNLP, 2022

2022
[39]

Adversary instantiation: Lower bounds for differentially private machine learning

Milad Nasr et al. Adversary instantiation: Lower bounds for differentially private machine learning. InIEEE S&P, 2021

2021
[40]

NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management

Alex Nelson et al. NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management. Technical report, NIST, 2025

2025
[41]

NIST updates NVD operations to address record CVE growth, 2026

NIST. NIST updates NVD operations to address record CVE growth, 2026

2026
[42]

On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit

Lukas Nitz et al. On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit. Threats, 6(1), 2025

2025
[43]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report. Technical report, Alibaba Group, 2025. arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation

Ossi Räisä, Joonas Jälkö, and Antti Honkela. Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation. InICML, 2024

2024
[45]

Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024

Maria Rigaki, Carlos Catania, and Sebastian Garcia. Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024

work page arXiv 2024
[46]

Quantifying language models’ sensitivity to spurious features in prompt design

Melanie Sclar et al. Quantifying language models’ sensitivity to spurious features in prompt design. InICLR, 2024

2024
[47]

Severo et al

Alex S. Severo et al. LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. InSBSeg, 2025

2025
[48]

Detecting pretraining data from large language models

Weijia Shi et al. Detecting pretraining data from large language models. InICLR, 2024

2024
[49]

Membership inference attacks against machine learning models

Reza Shokri et al. Membership inference attacks against machine learning models. InIEEE S&P, 2017

2017
[50]

LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres

Ronal Singh et al. LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres. InarXiv preprint, 2025. arXiv:2508.18947

work page arXiv 2025
[51]

VaultGemma: A differentially private Gemma model

Amit Sinha et al. VaultGemma: A differentially private Gemma model. Technical report, Google, 2025. arXiv:2510.15001

work page arXiv 2025
[52]

General and specific utility measures for synthetic data.J

Joshua Snoke et al. General and specific utility measures for synthetic data.J. R. Stat. Soc. A, 181(3), 2018

2018
[53]

Privacy auditing with one (1) training run

Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. InNeurIPS, 2023

2023
[54]

Considerations for differentially private learning with large-scale public pretraining

Florian Tramèr, Gautam Kamath, and Nicholas Carlini. Considerations for differentially private learning with large-scale public pretraining. arXiv:2212.06470, 2022

work page arXiv 2022
[55]

Leaner training, lower leakage: Revisiting memorization in LLM fine-tuning with LoRA.arXiv:2506.20856, 2025

Fei Wang and Baochun Li. Leaner training, lower leakage: Revisiting memorization in LLM fine-tuning with LoRA.arXiv:2506.20856, 2025

work page arXiv 2025
[56]

Generalization vs memorization: Tracing language models’ capabilities back to pretraining data

Xinyi Wang et al. Generalization vs memorization: Tracing language models’ capabilities back to pretraining data. InICLR, 2025

2025
[57]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom et al. Privacy risk in machine learning: Analyzing the connection to overfitting. InIEEE CSF, 2018

2018
[58]

Opacus: User-friendly differential privacy library in PyTorch

Ashkan Yousefpour et al. Opacus: User-friendly differential privacy library in PyTorch. InNeurIPS Workshop on Privacy in ML, 2021

2021
[59]

Differentially private fine-tuning of language models

Da Yu et al. Differentially private fine-tuning of language models. InICLR, 2022

2022
[60]

Analyzing information leakage of updates to natural language models

Santiago Zanella-Béguelin et al. Analyzing information leakage of updates to natural language models. InACM CCS, 2020

2020
[61]

Min-K%++: Improved baseline for pre-training data detection

Jingyang Zhang et al. Min-K%++: Improved baseline for pre-training data detection. InICLR, 2025

2025

[1] [1]

Deep learning with differential privacy

Martin Abadi et al. Deep learning with differential privacy. InACM CCS, 2016

2016

[2] [2]

Alaa et al

Ahmed M. Alaa et al. How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models. InICML, 2022

2022

[3] [3]

On-premise SLMs vs

Gefté Almeida et al. On-premise SLMs vs. commercial LLMs: Prompt engineering and incident classification in SOCs and CSIRTs. InERRC, 2025

2025

[4] [4]

Large-scale differentially private BERT

Rohan Anil et al. Large-scale differentially private BERT. InEMNLP, 2022

2022

[5] [5]

AttackQA: Development and adoption of a dataset for assisting cybersecurity operations using fine-tuned and open-source LLMs.arXiv:2411.01073, 2024

Varun Badrinath Krishna. AttackQA: Development and adoption of a dataset for assisting cybersecurity operations using fine-tuned and open-source LLMs.arXiv:2411.01073, 2024

work page arXiv 2024

[6] [6]

New proofs for NMAC and HMAC: Security without collision resistance

Mihir Bellare. New proofs for NMAC and HMAC: Security without collision resistance. InCRYPTO, 2006

2006

[7] [7]

Controlling the false discovery rate: A practical and powerful approach to multiple testing.J

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing.J. R. Stat. Soc. B, 57(1), 1995

1995

[8] [8]

Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026

Thierry Bossy et al. Mitigating unintended memorization with LoRA in federated learning for LLMs.TMLR, 2026. arXiv:2502.05087

work page arXiv 2026

[9] [9]

Brown et al

Tom B. Brown et al. Language models are few-shot learners. InNeurIPS, 2020

2020

[10] [10]

The secret sharer: Evaluating and testing unintended memorization in neural networks

Nicholas Carlini et al. The secret sharer: Evaluating and testing unintended memorization in neural networks. InUSENIX Security, 2019

2019

[11] [11]

Extracting training data from large language models

Nicholas Carlini et al. Extracting training data from large language models. InUSENIX Security, 2021

2021

[12] [12]

Quantifying memorization across neural language models

Nicholas Carlini et al. Quantifying memorization across neural language models. InICLR, 2023

2023

[13] [13]

Stealing part of a production language model

Nicholas Carlini et al. Stealing part of a production language model. InICML, 2024

2024

[14] [14]

QLoRA: Efficient finetuning of quantized LLMs

Tim Dettmers et al. QLoRA: Efficient finetuning of quantized LLMs. InNeurIPS, 2023

2023

[15] [15]

Do membership inference attacks work on large language models?arXiv:2402.07841, 2024

Michael Duan et al. Do membership inference attacks work on large language models?arXiv:2402.07841, 2024

work page arXiv 2024

[16] [16]

FIRST.Common Vulnerability Scoring System v3.1: Specification Document, 2019

2019

[17] [17]

Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020

Jonas Geiping et al. Inverting gradients: How easy is it to break privacy in federated learning? InNeurIPS, 2020

2020

[18] [18]

Gemma 3 Technical Report

Gemma Team. Gemma 3 technical report. Technical report, Google DeepMind, 2025. arXiv:2503.19786

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Numerical composition of differential privacy

Sivakanth Gopi, Yin Tat Lee, and Lukas Wutschitz. Numerical composition of differential privacy. InNeurIPS, 2021

2021

[20] [20]

The Llama 3 Herd of Models

Aaron Grattafiori et al. The Llama 3 herd of models. Technical report, Meta AI, 2024. arXiv:2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Large Language Models for Security Operations Centers: A Comprehensive Survey

Ali Habibzadeh, Farid Feyzi, and Reza Ebrahimi Atani. Large language models for security operations centers: A comprehensive survey. arXiv:2509.10858, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Measuring memorization in language models via probabilistic extraction

Jamie Hayes et al. Measuring memorization in language models via probabilistic extraction. InNAACL, 2025

2025

[23] [23]

Train longer, generalize better: Closing the generalization gap in large batch training of neural networks

Elad Hoffer, Itay Hubara, and Daniel Soudry. Train longer, generalize better: Closing the generalization gap in large batch training of neural networks. InNeurIPS, 2017

2017

[24] [24]

Hu et al

Edward J. Hu et al. LoRA: Low-rank adaptation of large language models. InICLR, 2022

2022

[25] [25]

Preventing generation of verbatim memorization in language models gives a false sense of privacy

Daphne Ippolito et al. Preventing generation of verbatim memorization in language models gives a false sense of privacy. InINLG, 2023

2023

[26] [26]

Measuring forgetting of memorized training examples

Matthew Jagielski et al. Measuring forgetting of memorized training examples. InICLR, 2023

2023

[27] [27]

User inference attacks on large language models

Nikhil Kandpal et al. User inference attacks on large language models. InEMNLP, 2024

2024

[28] [28]

Deduplicating training data mitigates privacy risks in language models

Nikhil Kandpal, Eric Wallace, and Colin Raffel. Deduplicating training data mitigates privacy risks in language models. InICML, 2022

2022

[29] [29]

AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data

Cristhian Kapelinski et al. AnonShield: Scalable on-premise pseudonymization for CSIRT network vulnerability data. InSBRC, 2026

2026

[30] [30]

et al. Krämer. Integrating large language models into security incident response. InSOUPS, 2025

2025

[31] [31]

HMAC: Keyed-hashing for message authentication

Hugo Krawczyk, Mihir Bellare, and Ran Canetti. HMAC: Keyed-hashing for message authentication. IETF RFC 2104, 1997

1997

[32] [32]

Efficient memory management for large language model serving with PagedAttention

Woosuk Kwon et al. Efficient memory management for large language model serving with PagedAttention. InSOSP, 2023

2023

[33] [33]

Large language models can be strong differentially private learners

Xuechen Li et al. Large language models can be strong differentially private learners. InICLR, 2022

2022

[34] [34]

Holistic evaluation of language models.TMLR, 2023

Percy Liang et al. Holistic evaluation of language models.TMLR, 2023

2023

[35] [35]

IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025

Xihuan Lin et al. IRCopilot: Automated incident response with large language models.arXiv:2505.20945, 2025

work page arXiv 2025

[36] [36]

Analyzing leakage of personally identifiable information in language models

Nils Lukas et al. Analyzing leakage of personally identifiable information in language models. InIEEE S&P, 2023

2023

[37] [37]

Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful

Martin Marek et al. Small batch size training for language models: When vanilla SGD works, and why gradient accumulation is wasteful. InNeurIPS, 2025

2025

[38] [38]

An empirical analysis of memorization in fine-tuned autoregressive language models

Fatemehsadat Mireshghallah et al. An empirical analysis of memorization in fine-tuned autoregressive language models. InEMNLP, 2022

2022

[39] [39]

Adversary instantiation: Lower bounds for differentially private machine learning

Milad Nasr et al. Adversary instantiation: Lower bounds for differentially private machine learning. InIEEE S&P, 2021

2021

[40] [40]

NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management

Alex Nelson et al. NIST SP 800-61r3: Incident response recommendations and considerations for cybersecurity risk management. Technical report, NIST, 2025

2025

[41] [41]

NIST updates NVD operations to address record CVE growth, 2026

NIST. NIST updates NVD operations to address record CVE growth, 2026

2026

[42] [42]

On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit

Lukas Nitz et al. On collaboration and automation in the context of threat detection and response with privacy-preserving features (SAPPAN).Digit. Threats, 6(1), 2025

2025

[43] [43]

Qwen3 Technical Report

Qwen Team. Qwen3 technical report. Technical report, Alibaba Group, 2025. arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation

Ossi Räisä, Joonas Jälkö, and Antti Honkela. Subsampling is not magic: Why large batch sizes work for differentially private stochastic optimisation. InICML, 2024

2024

[45] [45]

Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024

Maria Rigaki, Carlos Catania, and Sebastian Garcia. Hackphyr: A local fine-tuned LLM agent for network security environments.arXiv:2409.11276, 2024

work page arXiv 2024

[46] [46]

Quantifying language models’ sensitivity to spurious features in prompt design

Melanie Sclar et al. Quantifying language models’ sensitivity to spurious features in prompt design. InICLR, 2024

2024

[47] [47]

Severo et al

Alex S. Severo et al. LLMs e engenharia de prompt para classificação automatizada de incidentes em SOCs. InSBSeg, 2025

2025

[48] [48]

Detecting pretraining data from large language models

Weijia Shi et al. Detecting pretraining data from large language models. InICLR, 2024

2024

[49] [49]

Membership inference attacks against machine learning models

Reza Shokri et al. Membership inference attacks against machine learning models. InIEEE S&P, 2017

2017

[50] [50]

LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres

Ronal Singh et al. LLMs in the SOC: An empirical study of human-AI collaboration in security operations centres. InarXiv preprint, 2025. arXiv:2508.18947

work page arXiv 2025

[51] [51]

VaultGemma: A differentially private Gemma model

Amit Sinha et al. VaultGemma: A differentially private Gemma model. Technical report, Google, 2025. arXiv:2510.15001

work page arXiv 2025

[52] [52]

General and specific utility measures for synthetic data.J

Joshua Snoke et al. General and specific utility measures for synthetic data.J. R. Stat. Soc. A, 181(3), 2018

2018

[53] [53]

Privacy auditing with one (1) training run

Thomas Steinke, Milad Nasr, and Matthew Jagielski. Privacy auditing with one (1) training run. InNeurIPS, 2023

2023

[54] [54]

Considerations for differentially private learning with large-scale public pretraining

Florian Tramèr, Gautam Kamath, and Nicholas Carlini. Considerations for differentially private learning with large-scale public pretraining. arXiv:2212.06470, 2022

work page arXiv 2022

[55] [55]

Leaner training, lower leakage: Revisiting memorization in LLM fine-tuning with LoRA.arXiv:2506.20856, 2025

Fei Wang and Baochun Li. Leaner training, lower leakage: Revisiting memorization in LLM fine-tuning with LoRA.arXiv:2506.20856, 2025

work page arXiv 2025

[56] [56]

Generalization vs memorization: Tracing language models’ capabilities back to pretraining data

Xinyi Wang et al. Generalization vs memorization: Tracing language models’ capabilities back to pretraining data. InICLR, 2025

2025

[57] [57]

Privacy risk in machine learning: Analyzing the connection to overfitting

Samuel Yeom et al. Privacy risk in machine learning: Analyzing the connection to overfitting. InIEEE CSF, 2018

2018

[58] [58]

Opacus: User-friendly differential privacy library in PyTorch

Ashkan Yousefpour et al. Opacus: User-friendly differential privacy library in PyTorch. InNeurIPS Workshop on Privacy in ML, 2021

2021

[59] [59]

Differentially private fine-tuning of language models

Da Yu et al. Differentially private fine-tuning of language models. InICLR, 2022

2022

[60] [60]

Analyzing information leakage of updates to natural language models

Santiago Zanella-Béguelin et al. Analyzing information leakage of updates to natural language models. InACM CCS, 2020

2020

[61] [61]

Min-K%++: Improved baseline for pre-training data detection

Jingyang Zhang et al. Min-K%++: Improved baseline for pre-training data detection. InICLR, 2025

2025