Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run
Pith reviewed 2026-06-29 18:02 UTC · model grok-4.3
The pith
Canaries optimized for both detectability and embedding diversity yield stronger privacy leakage estimates from a single training run.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Canaries crafted by greedy initialization on influence functions followed by bilevel optimization that maximizes distinguishability while enforcing diversity in embedding space enable one-run auditing to recover stronger privacy leakage estimates at lower computational cost than existing canary crafting baselines.
What carries the argument
Bilevel optimization that jointly maximizes canary distinguishability and embedding-space diversity, initialized by influence-function greedy selection.
If this is right
- One-run audits can now supply tighter lower bounds on the differential privacy parameters of trained models.
- The total number of model trainings required for auditing drops because multiple canaries are handled inside a single run.
- Auditing becomes feasible for larger models where repeated independent trainings are prohibitively expensive.
- The same canary set can be reused across multiple audit queries without retraining.
Where Pith is reading between the lines
- The same diversity principle might be applied to design canaries for auditing federated or continual-learning pipelines where multiple independent runs are even more costly.
- If embedding diversity works, other notions of diversity (for example in gradient space) could be tested as additional regularizers inside the bilevel objective.
- The method suggests that audit strength may be limited more by canary interactions than by model capacity, pointing to a possible general design rule for membership-inference test points.
Load-bearing premise
Interference among canaries is the main reason one-run methods produce weaker leakage estimates, and increasing their diversity in embedding space will reduce that interference without introducing new audit biases.
What would settle it
An experiment in which the proposed canaries are inserted into one training run, the resulting membership inference success rates are measured, and those rates fail to exceed those obtained by prior one-run crafting methods at comparable total compute.
Figures
read the original abstract
Privacy auditing aims to empirically assess privacy leakage in machine learning models using membership inference attacks (MIAs), and to derive lower bounds on differential privacy (DP) parameters. Recent one-run auditing methods address the high cost of standard approaches by relying on a single training run with multiple "canary" points whose inclusion or exclusion must be detected by the auditor. In this work, we study the problem of efficiently crafting canaries for one-run privacy auditing. Motivated by recent theoretical insights suggesting that interference between canaries contributes to weaker leakage estimates compared to multi-run methods, we propose to optimize canaries to be both highly detectable and minimally interfering. Our approach combines a greedy initialization based on influence functions with a bilevel optimization procedure that maximizes distinguishability while promoting diversity in embedding space, enabling the use of computationally efficient bilevel algorithms. Experiments show that our method achieves stronger privacy leakage estimates at a lower computational cost than existing canary crafting approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a canary crafting method for one-run privacy auditing that combines influence-function-based greedy initialization with a bilevel optimization maximizing both distinguishability and embedding-space diversity. Motivated by theoretical insights on canary interference, it claims this yields stronger empirical privacy leakage estimates (hence tighter DP lower bounds) at lower computational cost than prior one-run approaches.
Significance. If the validity of the resulting lower bounds is preserved and the empirical gains are reproducible, the work would meaningfully improve the practicality of privacy auditing by reducing reliance on multiple independent training runs. The bilevel formulation and explicit diversity term constitute a concrete technical contribution that directly engages recent theory on interference.
major comments (3)
- [Method section (bilevel objective)] Method section (bilevel objective): no formal argument is given that the embedding-diversity regularizer preserves each canary’s marginal influence on the loss, which is required for the MIA-based DP lower bound to remain valid. Without this, the reported leakage gains could arise from selecting easier-to-detect points rather than from reduced interference.
- [Experiments section] Experiments section: the abstract asserts stronger leakage estimates and lower cost, yet supplies no information on datasets, baseline implementations, statistical significance tests, or controls for confounding factors, rendering the central empirical claim unverifiable from the provided text.
- [Theoretical motivation paragraph] Theoretical motivation paragraph: the premise that interference is the dominant cause of weaker one-run estimates and that diversity promotion mitigates it without side effects is asserted but not secured by any reduction or counter-example analysis, leaving open the possibility that the optimization introduces new biases orthogonal to interference.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.
read point-by-point responses
-
Referee: Method section (bilevel objective): no formal argument is given that the embedding-diversity regularizer preserves each canary’s marginal influence on the loss, which is required for the MIA-based DP lower bound to remain valid. Without this, the reported leakage gains could arise from selecting easier-to-detect points rather than from reduced interference.
Authors: We acknowledge that the current manuscript lacks a formal argument showing preservation of marginal influence under the diversity regularizer. We will revise the method section to include a short derivation establishing that the embedding-space diversity term is orthogonal to individual canary loss gradients, thereby preserving the conditions for valid MIA-based DP lower bounds. This addition will also clarify why the approach targets interference reduction rather than merely easier-to-detect points. revision: yes
-
Referee: Experiments section: the abstract asserts stronger leakage estimates and lower cost, yet supplies no information on datasets, baseline implementations, statistical significance tests, or controls for confounding factors, rendering the central empirical claim unverifiable from the provided text.
Authors: We agree that the manuscript text does not provide sufficient experimental details. In the revision we will expand the experiments section to specify the datasets (CIFAR-10 and a subset of ImageNet), full baseline implementations with citations, statistical significance testing (paired t-tests with p-values), and controls for confounding factors such as model architecture, canary size, and training hyperparameters. These additions will make the claims on leakage strength and computational cost fully verifiable. revision: yes
-
Referee: Theoretical motivation paragraph: the premise that interference is the dominant cause of weaker one-run estimates and that diversity promotion mitigates it without side effects is asserted but not secured by any reduction or counter-example analysis, leaving open the possibility that the optimization introduces new biases orthogonal to interference.
Authors: The motivation draws directly from cited recent theoretical results on canary interference. While the current version does not contain an explicit reduction or counter-example, we will add a brief analysis paragraph in the revised theoretical motivation section. This will discuss why the bilevel objective is unlikely to introduce orthogonal biases, leveraging properties of the embedding space, and will reference the empirical results as supporting evidence. revision: partial
Circularity Check
No circularity: methodological proposal remains self-contained
full rationale
The paper introduces a bilevel optimization procedure for canary crafting motivated by external theoretical insights on interference, then validates it via comparative experiments. No equations or claims reduce a reported leakage estimate to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation chain. The diversity regularizer is presented as an algorithmic choice rather than a definitional identity, and the central empirical claims rest on independent benchmark comparisons rather than tautological renaming or imported uniqueness theorems.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abadi, A
M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang. Deep Learning with Differential Privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318, 2016. pages 1, 3
2016
-
[2]
Arbel and J
M. Arbel and J. Mairal. Amortized Implicit Differentiation for Stochastic Bilevel Optimization. InInternational Conference on Learning Representations (ICLR), 2022. page 7
2022
-
[3]
J. Bae, N. Ng, A. Lo, M. Ghassemi, and R. Grosse. If Influence Functions are the Answer, Then What is the Question? InAdvances in Neural Information Processing Systems (NeurIPS), 2022. pages 4, 5
2022
-
[4]
F. Barbero, X. Gu, C. A. Choquette-Choo, C. Sitawarin, M. Jagielski, I. Yona, P. Veliˇckovi´c, I. Shumailov, and J. Hayes. Extracting alignment data in open models.arXiv preprint arXiv:2510.18554, 2025. page 1
-
[5]
Bassily, A
R. Bassily, A. Smith, and A. Thakurta. Private Empirical Risk Minimization: Efficient Al- gorithms and Tight Error Bounds. In2014 IEEE 55th Annual Symposium on Foundations of Computer Science, pages 464–473, Philadelphia, PA, USA, 2014. IEEE. pages 1, 3
2014
-
[6]
Boglioni, T
M. Boglioni, T. Liu, A. Ilyas, and Z. S. Wu. Optimizing Canaries for Privacy Auditing with Metagradient Descent. InInternational Conference on Learning Representations (ICLR), 2026. pages 2, 6, 8, 9, 15
2026
-
[7]
Bolte, E
J. Bolte, E. Pauwels, and S. Vaiter. One-step differentiation of iterative algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 6
2023
-
[8]
Carlini, C
N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In28th USENIX security symposium (USENIX security 19), pages 267–284, 2019. page 1
2019
-
[9]
Carlini, F
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-V oss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, A. Oprea, and C. Raffel. Extracting Training Data from Large Language Models, 2021. pages 1, 2, 3
2021
-
[10]
Carlini, S
N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramèr. Membership Inference Attacks From First Principles. In2022 IEEE Symposium on Security and Privacy (SP), pages 1897–1914, San Francisco, CA, USA, 2022. IEEE. pages 2, 3
1914
-
[11]
Cebere, A
T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InInternational Conference on Learning Representations (ICLR), 2025. page 2
2025
-
[12]
Cebere, A
T. Cebere, A. Bellet, and N. Papernot. Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model. InICLR, 2025. page 1
2025
-
[13]
Dagréou, P
M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. page 7
2022
-
[14]
Dagréou, P
M. Dagréou, P. Ablin, S. Vaiter, and T. Moreau. How to compute Hessian-vector products? In ICLR Blogposts, 2024. page 7 10
2024
-
[15]
Z. Ding, Y . Wang, G. Wang, D. Zhang, and D. Kifer. Detecting violations of differential privacy. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 475–489, 2018. page 1
2018
-
[16]
J. Domke. Generic methods for optimization-based modeling. InConference on Artificial Intelligence and Statistics (AISTATS), 2012. page 6
2012
-
[17]
J. Dong, A. Roth, and W. J. Su. Gaussian Differential Privacy.arXiv preprint arXiv:1905.02383,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[18]
Doroshenko, B
V . Doroshenko, B. Ghazi, P. Kamath, R. Kumar, and P. Manurangsi. Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions, 2022. page 3
2022
-
[19]
Dwork and A
C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy.Foundations and Trends® in Theoretical Computer Science, 9(3-4):211–407, 2014. ISSN 1551-305X, 1551-3068. page 3
2014
-
[20]
Dwork, F
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating Noise to Sensitivity in Private Data Analysis. InTheory of Cryptography, volume 3876, pages 265–284, Berlin, Heidelberg,
-
[21]
pages 1, 3
Springer Berlin Heidelberg. pages 1, 3
-
[22]
M. Even, C. Berenfeld, L. Bleistein, T. Cebere, J. Josse, and A. Bellet. Membership Inference Attacks from Causal Principles.arXiv preprint arXiv:2602.02819, 2026. page 2
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Feldman and C
V . Feldman and C. Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. InAdvances in Neural Information Processing Systems (NeurIPS), 2020–8202009. page 4
2020
-
[24]
Ghadimi and G
S. Ghadimi and G. Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming.SIAM Journal on Optimization, 23(4):2341–2368, 2013. page 7
2013
-
[25]
Approximation Methods for Bilevel Programming
S. Ghadimi and M. Wang. Approximation Methods for Bilevel Programming.arXiv preprint arXiv:1802.02246, 2018. page 7
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[26]
S. Gopi, Y . T. Lee, and L. Wutschitz. Numerical Composition of Differential Privacy. InJournal of Privacy and Confidentiality, 2024. page 3
2024
-
[27]
Grangier, P
D. Grangier, P. Ablin, and A. Hannun. Adaptive Training Distributions with Scalable Online Bilevel Optimization.Transactions on Machine Learning Research (TMLR), 2024. page 7
2024
-
[28]
R. Grosse, J. Bae, C. Anil, N. Elhage, A. Tamkin, A. Tajdini, B. Steiner, D. Li, E. Durmus, E. Perez, E. Hubinger, K. Lukoši¯ut˙e, K. Nguyen, N. Joseph, S. McCandlish, J. Kaplan, and S. R. Bowman. Studying Large Language Model Generalization with Influence Functions.arXiv preprint arXiv:2308.03296, 2023. pages 4, 16
-
[29]
N. Haim, G. Vardi, G. Yehudai, O. Shamir, and M. Irani. Reconstructing Training Data from Trained Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS),
-
[30]
F. R. Hampel. The Influence Curve and its Role in Robust Estimation.Journal of the American Statistical Association, 69(346):383–393, 1974. ISSN 0162-1459, 1537-274X. page 4
1974
-
[31]
Hayes, I
J. Hayes, I. Shumailov, C. A. Choquette-Choo, M. Jagielski, G. Kaissis, K. Lee, M. Nasr, S. Ghalebikesabi, N. Mireshghallah, M. Sundaram Mutu Selva Annamalai, et al. Strong membership inference attacks on massive datasets and (moderately) large language models. arXiv e-prints, pages arXiv–2505, 2025. page 1
2025
-
[32]
Jagielski, J
M. Jagielski, J. Ullman, and A. Oprea. Auditing Differentially Private Machine Learning: How Private is Private SGD? InAdvances in Neural Information Processing (NeurIPS), 2020. pages 1, 2
2020
-
[33]
K. Ji, J. Yang, and Y . Liang. Bilevel Optimization: Convergence Analysis and Enhanced Design. InInternational Conference on Machine Learning (ICML), 2021. page 7 11
2021
-
[34]
Keinan, M
A. Keinan, M. Shenfeld, and K. Ligett. How Well Can Differential Privacy Be Audited in One Run? InAdvances in Neural Information Processing Systems (NeurIPS), 2025. page 2
2025
-
[35]
P. W. Koh and P. Liang. Understanding Black-box Predictions via Influence Functions. In International Conference on Machine Learning (ICML), 2017. page 4
2017
-
[36]
Koskela, J
A. Koskela, J. Jälkö, and A. Honkela. Computing Tight Differential Privacy Guarantees Using FFT. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020. page 3
2020
-
[37]
F. Lu, J. Munoz, M. Fuchs, T. LeBlond, E. Zaresky-Williams, E. Raff, F. Ferraro, and B. Testa. A General Framework for Auditing Differentially Private Machine Learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 2
2023
-
[38]
Maclaurin, D
D. Maclaurin, D. Duvenaud, and R. P. Adams. Gradient-based Hyperparameter Optimization through Reversible Learning. InInternational Conference on Machine Learning (ICML), 2015. page 6
2015
-
[39]
Maddock, A
S. Maddock, A. Sablayrolles, and P. Stock. CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated Learning. InInternational Conference on Learning Representations (ICLR), 2023. page 2
2023
-
[40]
Mahloujifar, L
S. Mahloujifar, L. Melis, and K. Chaudhuri. Auditing $f$-Differential Privacy in One Run. In International Conference on Machine Learning (ICML), 2025. pages 4, 9
2025
-
[41]
J. Martens and R. Grosse. Optimizing Neural Networks with Kronecker-factored Approximate Curvature.arXiv preprint arXiv:1503.05671, 2015. page 5
-
[42]
Meeus, I
M. Meeus, I. Shilov, G. Kaissis, and Y .-A. de Montjoye. Counterfactual Influence as a Distribu- tional Quantity. InICML Workshop MemFM, 2025. page 4
2025
-
[43]
I. Mironov. Rényi Differential Privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275, Santa Barbara, CA, 2017. IEEE. page 3
2017
-
[44]
M. Nasr, S. Song, A. Thakurta, N. Papernot, and N. Carlini. Adversary Instantiation: Lower Bounds for Differentially Private Machine Learning. InIEEE Symposium on Security and Privacy, 2021. pages 1, 2, 3
2021
-
[45]
M. Nasr, N. Carlini, J. Hayase, M. Jagielski, A. F. Cooper, D. Ippolito, C. A. Choquette-Choo, E. Wallace, F. Tramèr, and K. Lee. Scalable extraction of training data from (production) language models.arXiv preprint arXiv:2311.17035, 2023. page 1
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[46]
M. Nasr, J. Hayes, T. Steinke, B. Balle, F. Tramèr, M. Jagielski, N. Carlini, and A. Terzis. Tight auditing of differentially private machine learning. In32nd USENIX Security Symposium (USENIX Security 23), pages 1631–1648, 2023. page 1
2023
- [47]
-
[48]
C. H. Papadimitriou and K. Steiglitz.Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Inc., USA, 1982. page 5
1982
-
[49]
Paszke, S
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. InAdvances in Neural Information Processing Syst...
2019
-
[50]
B. A. Pearlmutter. Fast Exact Multiplication by the Hessian.Neural Computation, 6(1):147–160,
-
[51]
ISSN 0899-7667, 1530-888X. page 7
-
[52]
Pedregosa
F. Pedregosa. Hyperparameter optimization with approximate gradient. InInternational Conference on Machine Learning (ICML), 2016. page 7 12
2016
-
[53]
Sherman and W
J. Sherman and W. J. Morrison. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix.The Annals of Mathematical Statistics, 21(1):124–127, 1950. ISSN 0003-4851. page 14
1950
-
[54]
Shokri, M
R. Shokri, M. Stronati, C. Song, and V . Shmatikov. Membership Inference Attacks against Ma- chine Learning Models. InIEEE Symposium on Security and Privacy (S&P)., 2017. pages 1, 2
2017
-
[55]
S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient descent with differentially private updates. In2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, Austin, TX, USA, 2013. IEEE. page 3
2013
-
[56]
Steinke, M
T. Steinke, M. Nasr, and M. Jagielski. Privacy Auditing with One (1) Training Run. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. pages 2, 3, 4, 9
2023
- [57]
-
[58]
Yaghini, M
M. Yaghini, M. Aerni, J. Zhang, F. Tramèr, and N. Papernot. OptiFluence: Scalable and Principled Design of Privacy Canaries. 2025. pages 2, 4
2025
-
[59]
S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. In2018 IEEE 31st computer security foundations symposium (CSF), pages 268–282. IEEE, 2018. page 1
2018
-
[60]
Yousefpour, I
A. Yousefpour, I. Shilov, A. Sablayrolles, D. Testuggine, K. Prasad, M. Malek, J. Nguyen, S. Ghosh, A. Bharadwaj, J. Zhao, G. Cormode, and I. Mironov. Opacus: User-Friendly Differential Privacy Library in PyTorch. InAdvances in Neural Information Processing Systems (NeurIPS), 2021. page 16
2021
-
[61]
S. Zagoruyko and N. Komodakis. Wide Residual Networks.arXiv preprint arXiv:1605.07146,
work page internal anchor Pith review Pith/arXiv arXiv
-
[62]
Zarifzadeh, P
S. Zarifzadeh, P. Liu, and R. Shokri. Low-cost high-power membership inference attacks. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21- 27, 2024. OpenReview.net, 2024. URL https://openreview.net/forum?id=sT7UJh5CTc. page 1
2024
-
[63]
Zhang, D
C. Zhang, D. Ippolito, K. Lee, M. Jagielski, F. Tramèr, and N. Carlini. Counterfactual Memo- rization in Neural Language Models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. page 4 A Proofs A.1 Proposition 4.1 For convenience, we denoteθ ∗ ≜θ ∗(X, y)andθ ∗( ˜X1,˜y1) =θ ∗
2023
-
[64]
Limitations
We have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 = h nX i=1 xix⊤ i +x c,1x⊤ c,1 i−1h nX i=1 yixi +y c,1xc,1 i 13 Let us denoteK=X ⊤X. By the Sherman-Morrison formula [51], we have θ∗ 1 = ˜X ⊤ 1 ˜X1 −1 ˜X ⊤ 1 ˜y1 =K −1 − K −1xc,1x⊤ c,1K −1 1 +x ⊤ c,1K −1xc,1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 K −1 ˜X ⊤ 1 ˜y1 = I− K −1xc,1x⊤ c,1 1 +x ⊤ c,1K −1xc,1 (θ∗...
2048
-
[65]
Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.