Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run

Aur\'elien Bellet; Mathieu Dagr\'eou

arxiv: 2605.27292 · v2 · pith:C6CXIO3Tnew · submitted 2026-05-26 · 💻 cs.LG · stat.ML

Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run

Mathieu Dagr\'eou , Aur\'elien Bellet This is my paper

Pith reviewed 2026-06-29 18:02 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords privacy auditingcanary craftingmembership inferencedifferential privacyone-run auditingbilevel optimizationembedding diversity

0 comments

The pith

Canaries optimized for both detectability and embedding diversity yield stronger privacy leakage estimates from a single training run.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to make one-run privacy auditing practical by addressing interference among multiple canary points that weakens membership inference signals. It does so by first selecting initial canaries via influence functions and then refining them through bilevel optimization that simultaneously increases their individual distinguishability and spreads them out in embedding space. If successful, this produces tighter empirical lower bounds on differential privacy parameters than prior one-run or multi-run methods while using fewer total training runs. A reader would care because auditing real-scale models becomes cheaper without sacrificing the reliability of the resulting privacy guarantees.

Core claim

Canaries crafted by greedy initialization on influence functions followed by bilevel optimization that maximizes distinguishability while enforcing diversity in embedding space enable one-run auditing to recover stronger privacy leakage estimates at lower computational cost than existing canary crafting baselines.

What carries the argument

Bilevel optimization that jointly maximizes canary distinguishability and embedding-space diversity, initialized by influence-function greedy selection.

If this is right

One-run audits can now supply tighter lower bounds on the differential privacy parameters of trained models.
The total number of model trainings required for auditing drops because multiple canaries are handled inside a single run.
Auditing becomes feasible for larger models where repeated independent trainings are prohibitively expensive.
The same canary set can be reused across multiple audit queries without retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same diversity principle might be applied to design canaries for auditing federated or continual-learning pipelines where multiple independent runs are even more costly.
If embedding diversity works, other notions of diversity (for example in gradient space) could be tested as additional regularizers inside the bilevel objective.
The method suggests that audit strength may be limited more by canary interactions than by model capacity, pointing to a possible general design rule for membership-inference test points.

Load-bearing premise

Interference among canaries is the main reason one-run methods produce weaker leakage estimates, and increasing their diversity in embedding space will reduce that interference without introducing new audit biases.

What would settle it

An experiment in which the proposed canaries are inserted into one training run, the resulting membership inference success rates are measured, and those rates fail to exceed those obtained by prior one-run crafting methods at comparable total compute.

Figures

Figures reproduced from arXiv: 2605.27292 by Aur\'elien Bellet, Mathieu Dagr\'eou.

**Figure 1.** Figure 1: Ablation study on WRN16-4/CIFAR10 (6 runs per boxplot). Left: TPR @ 0.05 FPR, non-private model. Right: Estimated ϵ for DP-SGD with ϵ = 10. We first conduct an ablation study on the WRN16-4 [59] architecture with the CIFAR10 dataset to evaluate the improvement brought by our influence-based preselection and our orthogonality regularization. We select a set of m = 1000 canaries from the dataset either ran… view at source ↗

**Figure 2.** Figure 2: Example of canaries generated by Algorithm [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

Privacy auditing aims to empirically assess privacy leakage in machine learning models using membership inference attacks (MIAs), and to derive lower bounds on differential privacy (DP) parameters. Recent one-run auditing methods address the high cost of standard approaches by relying on a single training run with multiple "canary" points whose inclusion or exclusion must be detected by the auditor. In this work, we study the problem of efficiently crafting canaries for one-run privacy auditing. Motivated by recent theoretical insights suggesting that interference between canaries contributes to weaker leakage estimates compared to multi-run methods, we propose to optimize canaries to be both highly detectable and minimally interfering. Our approach combines a greedy initialization based on influence functions with a bilevel optimization procedure that maximizes distinguishability while promoting diversity in embedding space, enabling the use of computationally efficient bilevel algorithms. Experiments show that our method achieves stronger privacy leakage estimates at a lower computational cost than existing canary crafting approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new canary crafting procedure that adds embedding diversity via bilevel optimization on top of influence-function initialization, but the abstract supplies no experimental details so the claimed gains in leakage estimates cannot be checked.

read the letter

The main thing to know is that the authors describe a concrete procedure for one-run canary crafting: greedy initialization from influence functions followed by a bilevel objective that jointly maximizes distinguishability and embedding-space diversity. This targets the interference problem noted in prior theory, and the bilevel framing is set up to allow efficient solvers.

What is actually new is the specific pairing of those two elements—the influence init plus the diversity regularizer—relative to the one-run auditing papers cited in the abstract. The motivation section does a clear job connecting the interference issue to weaker leakage estimates and explaining why diversity might help.

The soft spots sit in the evidence. The abstract states that experiments show stronger leakage estimates at lower cost, yet it supplies no datasets, no baseline descriptions, no significance tests, and no controls. Without those, it is impossible to tell whether the diversity term reduces interference as intended or simply selects canaries that are easier to detect for other reasons. The stress-test concern about possible new biases in the lower bound therefore lands, and the text gives no formal argument that the regularizer preserves the marginal influence needed for valid MIA-based DP bounds.

This paper is for people already working on empirical privacy auditing and membership inference. A reader focused on practical canary design might extract the optimization recipe and try it, but the current write-up does not yet let an outsider verify the central claim. It deserves a serious referee because the method is described in enough detail to be reproduced and stress-tested, even if the experiments will likely need substantial expansion and controls.

Referee Report

3 major / 0 minor

Summary. The paper proposes a canary crafting method for one-run privacy auditing that combines influence-function-based greedy initialization with a bilevel optimization maximizing both distinguishability and embedding-space diversity. Motivated by theoretical insights on canary interference, it claims this yields stronger empirical privacy leakage estimates (hence tighter DP lower bounds) at lower computational cost than prior one-run approaches.

Significance. If the validity of the resulting lower bounds is preserved and the empirical gains are reproducible, the work would meaningfully improve the practicality of privacy auditing by reducing reliance on multiple independent training runs. The bilevel formulation and explicit diversity term constitute a concrete technical contribution that directly engages recent theory on interference.

major comments (3)

[Method section (bilevel objective)] Method section (bilevel objective): no formal argument is given that the embedding-diversity regularizer preserves each canary’s marginal influence on the loss, which is required for the MIA-based DP lower bound to remain valid. Without this, the reported leakage gains could arise from selecting easier-to-detect points rather than from reduced interference.
[Experiments section] Experiments section: the abstract asserts stronger leakage estimates and lower cost, yet supplies no information on datasets, baseline implementations, statistical significance tests, or controls for confounding factors, rendering the central empirical claim unverifiable from the provided text.
[Theoretical motivation paragraph] Theoretical motivation paragraph: the premise that interference is the dominant cause of weaker one-run estimates and that diversity promotion mitigates it without side effects is asserted but not secured by any reduction or counter-example analysis, leaving open the possibility that the optimization introduces new biases orthogonal to interference.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: Method section (bilevel objective): no formal argument is given that the embedding-diversity regularizer preserves each canary’s marginal influence on the loss, which is required for the MIA-based DP lower bound to remain valid. Without this, the reported leakage gains could arise from selecting easier-to-detect points rather than from reduced interference.

Authors: We acknowledge that the current manuscript lacks a formal argument showing preservation of marginal influence under the diversity regularizer. We will revise the method section to include a short derivation establishing that the embedding-space diversity term is orthogonal to individual canary loss gradients, thereby preserving the conditions for valid MIA-based DP lower bounds. This addition will also clarify why the approach targets interference reduction rather than merely easier-to-detect points. revision: yes
Referee: Experiments section: the abstract asserts stronger leakage estimates and lower cost, yet supplies no information on datasets, baseline implementations, statistical significance tests, or controls for confounding factors, rendering the central empirical claim unverifiable from the provided text.

Authors: We agree that the manuscript text does not provide sufficient experimental details. In the revision we will expand the experiments section to specify the datasets (CIFAR-10 and a subset of ImageNet), full baseline implementations with citations, statistical significance testing (paired t-tests with p-values), and controls for confounding factors such as model architecture, canary size, and training hyperparameters. These additions will make the claims on leakage strength and computational cost fully verifiable. revision: yes
Referee: Theoretical motivation paragraph: the premise that interference is the dominant cause of weaker one-run estimates and that diversity promotion mitigates it without side effects is asserted but not secured by any reduction or counter-example analysis, leaving open the possibility that the optimization introduces new biases orthogonal to interference.

Authors: The motivation draws directly from cited recent theoretical results on canary interference. While the current version does not contain an explicit reduction or counter-example, we will add a brief analysis paragraph in the revised theoretical motivation section. This will discuss why the bilevel objective is unlikely to introduce orthogonal biases, leveraging properties of the embedding space, and will reference the empirical results as supporting evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: methodological proposal remains self-contained

full rationale

The paper introduces a bilevel optimization procedure for canary crafting motivated by external theoretical insights on interference, then validates it via comparative experiments. No equations or claims reduce a reported leakage estimate to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation chain. The diversity regularizer is presented as an algorithmic choice rather than a definitional identity, and the central empirical claims rest on independent benchmark comparisons rather than tautological renaming or imported uniqueness theorems.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no equations or implementation details, so free parameters, axioms, and invented entities cannot be enumerated; the central claim rests on unstated modeling choices in the bilevel objective and embedding diversity metric.

pith-pipeline@v0.9.1-grok · 5694 in / 1091 out tokens · 30292 ms · 2026-06-29T18:02:31.950030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 10 canonical work pages · 5 internal anchors

[1]

Abadi, A

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep Learning with Differential Privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016. pages 1, 3

2016
[2]

Arbel and J

M. Arbel and J. Mairal. Amortized Implicit Differentiation for Stochastic Bilevel Optimization. InInternational Conference on Learning Representations (ICLR), 2022. page 7

2022
[3]

J. Bae, N. Ng, A. Lo, M. Ghassemi, and R. Grosse. If Influence Functions are the Answer, Then What is the Question? InAdvances in Neural Information Processing Systems (NeurIPS), 2022. pages 4, 5

2022
[4]

Barbero, X

F. Barbero, X. Gu, C. A. Choquette-Choo, C. Sitawarin, M. Jagielski, I. Yona, P. Veliˇckovi´c, I. Shumailov, and J. Hayes. Extracting alignment data in open models.arXiv preprint arXiv:2510.18554, 2025. page 1

work page arXiv 2025
[5]

Bassily, A

R. Bassily, A. Smith, and A. Thakurta. Private Empirical Risk Minimization: Efficient Al- gorithms and Tight Error Bounds. In2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, Philadelphia, PA, USA, 2014. IEEE. pages 1, 3

2014
[6]

Boglioni, T

M. Boglioni, T. Liu, A. Ilyas, and Z. S. Wu. Optimizing Canaries for Privacy Auditing with Metagradient Descent. InInternational Conference on Learning Representations (ICLR), 2026. pages 2, 6, 8, 9, 15

2026
[7]

Bolte, E

J. Bolte, E. Pauwels, and S. Vaiter. One-step differentiation of iterative algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 6

2023
[8]

Carlini, C

N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019. page 1

2019
[9]

Carlini, F

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel. Extracting Training Data from Large Language Models, 2021. pages 1, 2, 3

2021
[10]

Carlini, S

N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramèr. Membership Inference Attacks From First Principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914, San Francisco, CA, USA, 2022. IEEE. pages 2, 3

1914
[11]

Cebere, A

T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InInternational Conference on Learning Representations (ICLR), 2025. page 2

2025
[12]

Cebere, A

T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InICLR, 2025. page 1

2025
[13]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. page 7

2022
[14]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. How to compute Hessian-vector products? In ICLR Blogposts, 2024. page 7 10

2024
[15]

Z. Ding, Y . Wang, G. Wang, D. Zhang, and D. Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 475–489, 2018. page 1

2018
[16]

J. Domke. Generic methods for optimization-based modeling. InConference on Artificial Intelligence and Statistics (AISTATS), 2012. page 6

2012
[17]

J. Dong, A. Roth, and W. J. Su. Gaussian Differential Privacy.arXiv preprint arXiv:1905.02383,

work page internal anchor Pith review Pith/arXiv arXiv 1905
[18]

Doroshenko, B

V . Doroshenko, B. Ghazi, P. Kamath, R. Kumar, and P. Manurangsi. Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions, 2022. page 3

2022
[19]

Dwork and A

C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy.Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2014. ISSN 1551-305X, 1551-3068. page 3

2014
[20]

Dwork, F

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. InTheory of Cryptography, volume 3876, pages 265–284, Berlin, Heidelberg,
[21]

pages 1, 3

Springer Berlin Heidelberg. pages 1, 3
[22]

M. Even, C. Berenfeld, L. Bleistein, T. Cebere, J. Josse, and A. Bellet. Membership Inference Attacks from Causal Principles.arXiv preprint arXiv:2602.02819, 2026. page 2

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Feldman and C

V . Feldman and C. Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. InAdvances in Neural Information Processing Systems (NeurIPS), 2020–8202009. page 4

2020
[24]

Ghadimi and G

S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM Journal on Optimization, 23(4):2341–2368, 2013. page 7

2013
[25]

Approximation Methods for Bilevel Programming

S. Ghadimi and M. Wang. Approximation Methods for Bilevel Programming.arXiv preprint arXiv:1802.02246, 2018. page 7

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

S. Gopi, Y . T. Lee, and L. Wutschitz. Numerical Composition of Differential Privacy. InJournal of Privacy and Confidentiality, 2024. page 3

2024
[27]

Grangier, P

D. Grangier, P. Ablin, and A. Hannun. Adaptive Training Distributions with Scalable Online Bilevel Optimization.Transactions on Machine Learning Research (TMLR), 2024. page 7

2024
[28]

Grosse, J

R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukoši¯ut˙e, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, and S. R. Bowman. Studying Large Language Model Generalization with Influence Functions.arXiv preprint arXiv:2308.03296, 2023. pages 4, 16

work page arXiv 2023
[29]

N. Haim, G. Vardi, G. Yehudai, O. Shamir, and M. Irani. Reconstructing Training Data from Trained Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS),
[30]

F. R. Hampel. The Influence Curve and its Role in Robust Estimation.Journal of the American Statistical Association, 69(346):383–393, 1974. ISSN 0162-1459, 1537-274X. page 4

1974
[31]

Hayes, I

J. Hayes, I. Shumailov, C. A. Choquette-Choo, M. Jagielski, G. Kaissis, K. Lee, M. Nasr, S. Ghalebikesabi, N. Mireshghallah, M. Sundaram Mutu Selva Annamalai, et al. Strong membership inference attacks on massive datasets and (moderately) large language models. arXiv e-prints, pages arXiv–2505, 2025. page 1

2025
[32]

Jagielski, J

M. Jagielski, J. Ullman, and A. Oprea. Auditing Differentially Private Machine Learning: How Private is Private SGD? InAdvances in Neural Information Processing (NeurIPS), 2020. pages 1, 2

2020
[33]

K. Ji, J. Yang, and Y . Liang. Bilevel Optimization: Convergence Analysis and Enhanced Design. InInternational Conference on Machine Learning (ICML), 2021. page 7 11

2021
[34]

Keinan, M

A. Keinan, M. Shenfeld, and K. Ligett. How Well Can Differential Privacy Be Audited in One Run? InAdvances in Neural Information Processing Systems (NeurIPS), 2025. page 2

2025
[35]

P. W. Koh and P. Liang. Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning (ICML), 2017. page 4

2017
[36]

Koskela, J

A. Koskela, J. Jälkö, and A. Honkela. Computing Tight Differential Privacy Guarantees Using FFT. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020. page 3

2020
[37]

F. Lu, J. Munoz, M. Fuchs, T. LeBlond, E. Zaresky-Williams, E. Raff, F. Ferraro, and B. Testa. A General Framework for Auditing Differentially Private Machine Learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 2

2023
[38]

Maclaurin, D

D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradient-based Hyperparameter Optimization through Reversible Learning. InInternational Conference on Machine Learning (ICML), 2015. page 6

2015
[39]

Maddock, A

S. Maddock, A. Sablayrolles, and P. Stock. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning. InInternational Conference on Learning Representations (ICLR), 2023. page 2

2023
[40]

Mahloujifar, L

S. Mahloujifar, L. Melis, and K. Chaudhuri. Auditing $f$-Differential Privacy in One Run. In International Conference on Machine Learning (ICML), 2025. pages 4, 9

2025
[41]

Martens and R

J. Martens and R. Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature.arXiv preprint arXiv:1503.05671, 2015. page 5

work page arXiv 2015
[42]

Meeus, I

M. Meeus, I. Shilov, G. Kaissis, and Y .-A. de Montjoye. Counterfactual Influence as a Distribu- tional Quantity. InICML Workshop MemFM, 2025. page 4

2025
[43]

I. Mironov. Rényi Differential Privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, Santa Barbara, CA, 2017. IEEE. page 3

2017
[44]

M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini. Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning. InIEEE Symposium on Security and Privacy, 2021. pages 1, 2, 3

2021
[45]

M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee. Scalable extraction of training data from (production) language models.arXiv preprint arXiv:2311.17035, 2023. page 1

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tramèr, M. Jagielski, N. Carlini, and A. Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023. page 1

2023
[47]

M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tramèr, M. Jagielski, N. Carlini, and A. Terzis. Tight Auditing of Differentially Private Machine Learning.arXiv preprint arXiv:2302.07956, 2023. pages 2, 4

work page arXiv 2023
[48]

C. H. Papadimitriou and K. Steiglitz.Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., USA, 1982. page 5

1982
[49]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. InAdvances in Neural Information Processing Syst...

2019
[50]

B. A. Pearlmutter. Fast Exact Multiplication by the Hessian.Neural Computation, 6(1):147–160,
[51]

ISSN 0899-7667, 1530-888X. page 7
[52]

Pedregosa

F. Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning (ICML), 2016. page 7 12

2016
[53]

Sherman and W

J. Sherman and W. J. Morrison. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix.The Annals of Mathematical Statistics, 21(1):124–127, 1950. ISSN 0003-4851. page 14

1950
[54]

Shokri, M

R. Shokri, M. Stronati, C. Song, and V . Shmatikov. Membership Inference Attacks against Ma- chine Learning Models. InIEEE Symposium on Security and Privacy (S&P)., 2017. pages 1, 2

2017
[55]

S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient descent with differentially private updates. In2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, Austin, TX, USA, 2013. IEEE. page 3

2013
[56]

Steinke, M

T. Steinke, M. Nasr, and M. Jagielski. Privacy Auditing with One (1) Training Run. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. pages 2, 3, 4, 9

2023
[57]

Xiang, T

Z. Xiang, T. Wang, and D. Wang. Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run.arXiv preprint arXiv:2501.17750, 2025. page 2

work page arXiv 2025
[58]

Yaghini, M

M. Yaghini, M. Aerni, J. Zhang, F. Tramèr, and N. Papernot. OptiFluence: Scalable and Principled Design of Privacy Canaries. 2025. pages 2, 4

2025
[59]

S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018. page 1

2018
[60]

Yousefpour, I

A. Yousefpour, I. Shilov, A. Sablayrolles, D. Testuggine, K. Prasad, M. Malek, J. Nguyen, S. Ghosh, A. Bharadwaj, J. Zhao, G. Cormode, and I. Mironov. Opacus: User-Friendly Differential Privacy Library in PyTorch. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. page 16

2021
[61]

Wide Residual Networks

S. Zagoruyko and N. Komodakis. Wide Residual Networks.arXiv preprint arXiv:1605.07146,

work page internal anchor Pith review Pith/arXiv arXiv
[62]

Zarifzadeh, P

S. Zarifzadeh, P. Liu, and R. Shokri. Low-cost high-power membership inference attacks. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21- 27, 2024. OpenReview.net, 2024. URL https://openreview.net/forum?id=sT7UJh5CTc. page 1

2024
[63]

Zhang, D

C. Zhang, D. Ippolito, K. Lee, M. Jagielski, F. Tramèr, and N. Carlini. Counterfactual Memo- rization in Neural Language Models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 4 A Proofs A.1 Proposition 4.1 For convenience, we denoteθ ∗ ≜θ ∗(X, y)andθ ∗( ˜X1,˜y1) =θ ∗

2023
[64]

Limitations

We have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 = h nX i=1 xix⊤ i +x c,1x⊤ c,1 i−1h nX i=1 yixi +y c,1xc,1 i 13 Let us denoteK=X ⊤X. By the Sherman-Morrison formula [51], we have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 =K −1 − K −1xc,1x⊤ c,1K −1 1 +x ⊤ c,1K −1xc,1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 K −1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 (θ∗...

2048
[65]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

[1] [1]

Abadi, A

M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep Learning with Differential Privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016. pages 1, 3

2016

[2] [2]

Arbel and J

M. Arbel and J. Mairal. Amortized Implicit Differentiation for Stochastic Bilevel Optimization. InInternational Conference on Learning Representations (ICLR), 2022. page 7

2022

[3] [3]

J. Bae, N. Ng, A. Lo, M. Ghassemi, and R. Grosse. If Influence Functions are the Answer, Then What is the Question? InAdvances in Neural Information Processing Systems (NeurIPS), 2022. pages 4, 5

2022

[4] [4]

Barbero, X

F. Barbero, X. Gu, C. A. Choquette-Choo, C. Sitawarin, M. Jagielski, I. Yona, P. Veliˇckovi´c, I. Shumailov, and J. Hayes. Extracting alignment data in open models.arXiv preprint arXiv:2510.18554, 2025. page 1

work page arXiv 2025

[5] [5]

Bassily, A

R. Bassily, A. Smith, and A. Thakurta. Private Empirical Risk Minimization: Efficient Al- gorithms and Tight Error Bounds. In2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, Philadelphia, PA, USA, 2014. IEEE. pages 1, 3

2014

[6] [6]

Boglioni, T

M. Boglioni, T. Liu, A. Ilyas, and Z. S. Wu. Optimizing Canaries for Privacy Auditing with Metagradient Descent. InInternational Conference on Learning Representations (ICLR), 2026. pages 2, 6, 8, 9, 15

2026

[7] [7]

Bolte, E

J. Bolte, E. Pauwels, and S. Vaiter. One-step differentiation of iterative algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 6

2023

[8] [8]

Carlini, C

N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019. page 1

2019

[9] [9]

Carlini, F

N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel. Extracting Training Data from Large Language Models, 2021. pages 1, 2, 3

2021

[10] [10]

Carlini, S

N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramèr. Membership Inference Attacks From First Principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914, San Francisco, CA, USA, 2022. IEEE. pages 2, 3

1914

[11] [11]

Cebere, A

T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InInternational Conference on Learning Representations (ICLR), 2025. page 2

2025

[12] [12]

Cebere, A

T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InICLR, 2025. page 1

2025

[13] [13]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. page 7

2022

[14] [14]

Dagréou, P

M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. How to compute Hessian-vector products? In ICLR Blogposts, 2024. page 7 10

2024

[15] [15]

Z. Ding, Y . Wang, G. Wang, D. Zhang, and D. Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 475–489, 2018. page 1

2018

[16] [16]

J. Domke. Generic methods for optimization-based modeling. InConference on Artificial Intelligence and Statistics (AISTATS), 2012. page 6

2012

[17] [17]

J. Dong, A. Roth, and W. J. Su. Gaussian Differential Privacy.arXiv preprint arXiv:1905.02383,

work page internal anchor Pith review Pith/arXiv arXiv 1905

[18] [18]

Doroshenko, B

V . Doroshenko, B. Ghazi, P. Kamath, R. Kumar, and P. Manurangsi. Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions, 2022. page 3

2022

[19] [19]

Dwork and A

C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy.Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2014. ISSN 1551-305X, 1551-3068. page 3

2014

[20] [20]

Dwork, F

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. InTheory of Cryptography, volume 3876, pages 265–284, Berlin, Heidelberg,

[21] [21]

pages 1, 3

Springer Berlin Heidelberg. pages 1, 3

[22] [22]

M. Even, C. Berenfeld, L. Bleistein, T. Cebere, J. Josse, and A. Bellet. Membership Inference Attacks from Causal Principles.arXiv preprint arXiv:2602.02819, 2026. page 2

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

Feldman and C

V . Feldman and C. Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. InAdvances in Neural Information Processing Systems (NeurIPS), 2020–8202009. page 4

2020

[24] [24]

Ghadimi and G

S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM Journal on Optimization, 23(4):2341–2368, 2013. page 7

2013

[25] [25]

Approximation Methods for Bilevel Programming

S. Ghadimi and M. Wang. Approximation Methods for Bilevel Programming.arXiv preprint arXiv:1802.02246, 2018. page 7

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

S. Gopi, Y . T. Lee, and L. Wutschitz. Numerical Composition of Differential Privacy. InJournal of Privacy and Confidentiality, 2024. page 3

2024

[27] [27]

Grangier, P

D. Grangier, P. Ablin, and A. Hannun. Adaptive Training Distributions with Scalable Online Bilevel Optimization.Transactions on Machine Learning Research (TMLR), 2024. page 7

2024

[28] [28]

Grosse, J

R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukoši¯ut˙e, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, and S. R. Bowman. Studying Large Language Model Generalization with Influence Functions.arXiv preprint arXiv:2308.03296, 2023. pages 4, 16

work page arXiv 2023

[29] [29]

N. Haim, G. Vardi, G. Yehudai, O. Shamir, and M. Irani. Reconstructing Training Data from Trained Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS),

[30] [30]

F. R. Hampel. The Influence Curve and its Role in Robust Estimation.Journal of the American Statistical Association, 69(346):383–393, 1974. ISSN 0162-1459, 1537-274X. page 4

1974

[31] [31]

Hayes, I

J. Hayes, I. Shumailov, C. A. Choquette-Choo, M. Jagielski, G. Kaissis, K. Lee, M. Nasr, S. Ghalebikesabi, N. Mireshghallah, M. Sundaram Mutu Selva Annamalai, et al. Strong membership inference attacks on massive datasets and (moderately) large language models. arXiv e-prints, pages arXiv–2505, 2025. page 1

2025

[32] [32]

Jagielski, J

M. Jagielski, J. Ullman, and A. Oprea. Auditing Differentially Private Machine Learning: How Private is Private SGD? InAdvances in Neural Information Processing (NeurIPS), 2020. pages 1, 2

2020

[33] [33]

K. Ji, J. Yang, and Y . Liang. Bilevel Optimization: Convergence Analysis and Enhanced Design. InInternational Conference on Machine Learning (ICML), 2021. page 7 11

2021

[34] [34]

Keinan, M

A. Keinan, M. Shenfeld, and K. Ligett. How Well Can Differential Privacy Be Audited in One Run? InAdvances in Neural Information Processing Systems (NeurIPS), 2025. page 2

2025

[35] [35]

P. W. Koh and P. Liang. Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning (ICML), 2017. page 4

2017

[36] [36]

Koskela, J

A. Koskela, J. Jälkö, and A. Honkela. Computing Tight Differential Privacy Guarantees Using FFT. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020. page 3

2020

[37] [37]

F. Lu, J. Munoz, M. Fuchs, T. LeBlond, E. Zaresky-Williams, E. Raff, F. Ferraro, and B. Testa. A General Framework for Auditing Differentially Private Machine Learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 2

2023

[38] [38]

Maclaurin, D

D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradient-based Hyperparameter Optimization through Reversible Learning. InInternational Conference on Machine Learning (ICML), 2015. page 6

2015

[39] [39]

Maddock, A

S. Maddock, A. Sablayrolles, and P. Stock. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning. InInternational Conference on Learning Representations (ICLR), 2023. page 2

2023

[40] [40]

Mahloujifar, L

S. Mahloujifar, L. Melis, and K. Chaudhuri. Auditing $f$-Differential Privacy in One Run. In International Conference on Machine Learning (ICML), 2025. pages 4, 9

2025

[41] [41]

Martens and R

J. Martens and R. Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature.arXiv preprint arXiv:1503.05671, 2015. page 5

work page arXiv 2015

[42] [42]

Meeus, I

M. Meeus, I. Shilov, G. Kaissis, and Y .-A. de Montjoye. Counterfactual Influence as a Distribu- tional Quantity. InICML Workshop MemFM, 2025. page 4

2025

[43] [43]

I. Mironov. Rényi Differential Privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, Santa Barbara, CA, 2017. IEEE. page 3

2017

[44] [44]

M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini. Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning. InIEEE Symposium on Security and Privacy, 2021. pages 1, 2, 3

2021

[45] [45]

M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee. Scalable extraction of training data from (production) language models.arXiv preprint arXiv:2311.17035, 2023. page 1

work page internal anchor Pith review Pith/arXiv arXiv 2023

[46] [46]

M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tramèr, M. Jagielski, N. Carlini, and A. Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023. page 1

2023

[47] [47]

M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tramèr, M. Jagielski, N. Carlini, and A. Terzis. Tight Auditing of Differentially Private Machine Learning.arXiv preprint arXiv:2302.07956, 2023. pages 2, 4

work page arXiv 2023

[48] [48]

C. H. Papadimitriou and K. Steiglitz.Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., USA, 1982. page 5

1982

[49] [49]

Paszke, S

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. InAdvances in Neural Information Processing Syst...

2019

[50] [50]

B. A. Pearlmutter. Fast Exact Multiplication by the Hessian.Neural Computation, 6(1):147–160,

[51] [51]

ISSN 0899-7667, 1530-888X. page 7

[52] [52]

Pedregosa

F. Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning (ICML), 2016. page 7 12

2016

[53] [53]

Sherman and W

J. Sherman and W. J. Morrison. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix.The Annals of Mathematical Statistics, 21(1):124–127, 1950. ISSN 0003-4851. page 14

1950

[54] [54]

Shokri, M

R. Shokri, M. Stronati, C. Song, and V . Shmatikov. Membership Inference Attacks against Ma- chine Learning Models. InIEEE Symposium on Security and Privacy (S&P)., 2017. pages 1, 2

2017

[55] [55]

S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient descent with differentially private updates. In2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, Austin, TX, USA, 2013. IEEE. page 3

2013

[56] [56]

Steinke, M

T. Steinke, M. Nasr, and M. Jagielski. Privacy Auditing with One (1) Training Run. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. pages 2, 3, 4, 9

2023

[57] [57]

Xiang, T

Z. Xiang, T. Wang, and D. Wang. Privacy Audit as Bits Transmission: (Im)possibilities for Audit by One Run.arXiv preprint arXiv:2501.17750, 2025. page 2

work page arXiv 2025

[58] [58]

Yaghini, M

M. Yaghini, M. Aerni, J. Zhang, F. Tramèr, and N. Papernot. OptiFluence: Scalable and Principled Design of Privacy Canaries. 2025. pages 2, 4

2025

[59] [59]

S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018. page 1

2018

[60] [60]

Yousefpour, I

A. Yousefpour, I. Shilov, A. Sablayrolles, D. Testuggine, K. Prasad, M. Malek, J. Nguyen, S. Ghosh, A. Bharadwaj, J. Zhao, G. Cormode, and I. Mironov. Opacus: User-Friendly Differential Privacy Library in PyTorch. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. page 16

2021

[61] [61]

Wide Residual Networks

S. Zagoruyko and N. Komodakis. Wide Residual Networks.arXiv preprint arXiv:1605.07146,

work page internal anchor Pith review Pith/arXiv arXiv

[62] [62]

Zarifzadeh, P

S. Zarifzadeh, P. Liu, and R. Shokri. Low-cost high-power membership inference attacks. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21- 27, 2024. OpenReview.net, 2024. URL https://openreview.net/forum?id=sT7UJh5CTc. page 1

2024

[63] [63]

Zhang, D

C. Zhang, D. Ippolito, K. Lee, M. Jagielski, F. Tramèr, and N. Carlini. Counterfactual Memo- rization in Neural Language Models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 4 A Proofs A.1 Proposition 4.1 For convenience, we denoteθ ∗ ≜θ ∗(X, y)andθ ∗( ˜X1,˜y1) =θ ∗

2023

[64] [64]

Limitations

We have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 = h nX i=1 xix⊤ i +x c,1x⊤ c,1 i−1h nX i=1 yixi +y c,1xc,1 i 13 Let us denoteK=X ⊤X. By the Sherman-Morrison formula [51], we have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 =K −1 − K −1xc,1x⊤ c,1K −1 1 +x ⊤ c,1K −1xc,1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 K −1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 (θ∗...

2048

[65] [65]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...