pith. machine review for the scientific record. sign in

arxiv: 2605.07072 · v1 · submitted 2026-05-08 · 💻 cs.LG · cs.CR· stat.ML

Recognition: no theorem link

Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CRstat.ML
keywords differential privacyDP-SGDsubsamplingprivacy amplificationstochastic gradient descentmachine learning
0
0 comments X

The pith

Balanced Iteration Subsampling achieves stronger privacy amplification than Poisson subsampling in DP-SGD.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Poisson subsampling, the default in differentially private stochastic gradient descent, creates unnecessary variance in how often each data point appears across training iterations. This variance weakens the privacy amplification that comes from adding noise. Balanced Iteration Subsampling instead fixes each point to participate in exactly the same number of iterations while keeping the average participation rate uniform. The result is stronger privacy guarantees, especially when noise is low, which allows less noise to be added for the same privacy level and therefore higher model accuracy. The authors supply both limiting-case proofs and a Monte Carlo accountant that gives tight finite-noise bounds.

Core claim

We prove that Balanced Iteration Subsampling (BIS) achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum (σ → 0 and σ → ∞). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations.

What carries the argument

Balanced Iteration Subsampling (BIS), a deterministic scheme that assigns each sample to a fixed number of iterations while maintaining the same marginal probability of inclusion per iteration.

If this is right

  • BIS reduces the noise multiplier needed by up to 9.6 percent in more than 60 tested DP-SGD configurations.
  • The improvement appears most clearly in low-noise regimes that matter for high-utility private training.
  • A near-exact Monte Carlo accountant removes the looseness of prior RDP and PLD composition bounds for BIS.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Frameworks could replace random sampling with BIS without changing the overall training loop structure.
  • Similar fixed-count participation rules might improve privacy-utility tradeoffs in other iterative private algorithms.

Load-bearing premise

The privacy amplification advantage holds when participation variance is the dominant factor controlling the privacy-noise tradeoff.

What would settle it

An empirical or analytical comparison in which Poisson subsampling requires strictly less noise than BIS to reach the same privacy level at an intermediate noise multiplier would refute the optimality claim.

read the original abstract

Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that Balanced Iteration Subsampling (BIS), a structured scheme in which each sample participates in exactly a fixed number of iterations, achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum ($\sigma \to 0$ and $\sigma \to \infty$). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations. To translate this asymptotic theory into finite-noise guarantees, we introduce a practical near-exact Monte Carlo accountant for BIS, which removes the analytical slack of existing RDP and composition-based PLD analyses. Evaluations across more than 60 practical DP-SGD configurations show that BIS consistently outperforms Poisson subsampling in the low-noise regimes most relevant for high-utility private training, reducing the required noise multiplier by up to $9.6\%$. These results overturn the common intuition that more sampling randomness necessarily yields stronger privacy amplification: in DP-SGD, structured participation can be both more practical and more private. Our implementation is available at https://github.com/dong-xin-ao-andy/bis-mc-accountant.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Balanced Iteration Subsampling (BIS), a structured subsampling scheme for DP-SGD in which each training example participates in a fixed number of iterations. It claims to prove that BIS yields strictly stronger privacy amplification than Poisson subsampling for finite noise multipliers and is optimal at both σ → 0 and σ → ∞. The authors replace standard RDP/PLD composition with a near-exact Monte Carlo accountant for BIS, report up to 9.6% noise-multiplier reduction across more than 60 DP-SGD configurations, and release an implementation.

Significance. If the asymptotic optimality proofs hold and the Monte Carlo accountant can be shown to produce conservative upper bounds, the result would challenge the default status of Poisson subsampling and could improve the privacy-utility frontier for high-utility DP training. The open-source release of the accountant is a clear strength for reproducibility.

major comments (1)
  1. [Monte Carlo accountant section] The Monte Carlo accountant (described in the section on finite-noise guarantees) estimates the privacy-loss distribution by sampling participation patterns and noise realizations but provides no one-sided concentration inequality, explicit additive margin, or proven conservative bound guaranteeing that the reported (ε, δ) values upper-bound the true privacy loss. Because the 9.6% noise-reduction figures and the finite-σ superiority claim rest directly on these estimates, the absence of such a guarantee is load-bearing for the central practical contribution.
minor comments (2)
  1. [Abstract] The abstract states “more than 60 practical DP-SGD configurations” without specifying the exact count, the range of dataset sizes, model architectures, or the precise criteria used to select or exclude configurations.
  2. [Introduction / BIS definition] Notation for the fixed participation count in BIS is introduced without an explicit equation reference in the early sections; adding a numbered definition would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The point raised about the Monte Carlo accountant is well-taken and directly affects the strength of our finite-noise claims. We address it below and will revise the manuscript to incorporate a formal conservative guarantee.

read point-by-point responses
  1. Referee: [Monte Carlo accountant section] The Monte Carlo accountant (described in the section on finite-noise guarantees) estimates the privacy-loss distribution by sampling participation patterns and noise realizations but provides no one-sided concentration inequality, explicit additive margin, or proven conservative bound guaranteeing that the reported (ε, δ) values upper-bound the true privacy loss. Because the 9.6% noise-reduction figures and the finite-σ superiority claim rest directly on these estimates, the absence of such a guarantee is load-bearing for the central practical contribution.

    Authors: We agree that the absence of a proven conservative bound is a substantive limitation for the practical claims. The current manuscript describes the accountant as 'near-exact' on the basis of large-scale sampling but does not supply a one-sided concentration inequality or explicit margin. In the revised manuscript we will add a dedicated subsection deriving such a bound. Concretely, we will apply a Hoeffding-type inequality to the empirical (1-δ)-quantile of the sampled privacy-loss random variable, yielding an additive margin that guarantees, with probability at least 1-η, that the reported ε is an upper bound on the true privacy loss. We will state the sample size, failure probability η, and resulting margin for every reported configuration, thereby making the 9.6 % noise-multiplier reductions and finite-σ comparisons formally rigorous. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its main result via a mathematical proof that Balanced Iteration Subsampling (BIS) yields stronger privacy amplification than Poisson subsampling and is optimal at the noise extremes σ→0 and σ→∞, together with a separate Monte Carlo accountant for finite-σ guarantees. No equations, definitions, or claims in the abstract or described analysis reduce the optimality statement or the reported noise-reduction figures to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose content is itself unverified. The governing insight—that participation variance, not randomness per se, limits amplification—is presented as the output of the new analysis rather than an input assumption. Empirical evaluations are reported as corroboration, not as the source of the theoretical claims. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the newly introduced BIS scheme and the Monte Carlo accountant; the analysis assumes standard differential privacy composition rules and that participation counts are the dominant factor in amplification.

axioms (1)
  • domain assumption Privacy amplification in subsampled DP-SGD is governed by participation variance and marginal uniformity.
    Invoked to establish optimality of BIS over Poisson subsampling.
invented entities (1)
  • Balanced Iteration Subsampling (BIS) no independent evidence
    purpose: Structured subsampling scheme that fixes the number of participations per sample.
    Newly proposed mechanism whose privacy properties are analyzed in the paper.

pith-pipeline@v0.9.0 · 5582 in / 1251 out tokens · 38428 ms · 2026-05-11T00:59:57.602873+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 9 canonical work pages

  1. [1]

    B., Mironov, I., Talwar, K., and Zhang, L

    Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages 308--318

  2. [2]

    Balle, B., Berrada, L., Charles, Z., Choquette-Choo, C. A., De, S., Doroshenko, V., Dvijotham, D., Galen, A., Ganesh, A., Ghalebikesabi, S., Hayes, J., Kairouz, P., McKenna, R., McMahan, B., Mitchell, N., Pappu, A., Ponomareva, N., Pravilov, M., Rush, K., Smith, S. L., and Stanforth, R. (2022). JAX - P rivacy: Algorithms for privacy-preserving machine lea...

  3. [3]

    A., Ganesh, A., Steinke, T., and Thakurta, A

    Choquette-Choo, C. A., Ganesh, A., Steinke, T., and Thakurta, A. (2023). Privacy amplification for matrix mechanisms. arXiv preprint arXiv:2310.15526

  4. [4]

    Chua, L., Ghazi, B., Harrison, C., Leeman, E., Kamath, P., Kumar, R., Manurangsi, P., Sinha, A., and Zhang, C. (2024). Balls-and-bins sampling for dp-sgd. arXiv preprint arXiv:2412.16802

  5. [5]

    arXiv preprint arXiv:2204.13650 , year=

    De, S., Berrada, L., Hayes, J., Smith, S. L., and Balle, B. (2022). Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650

  6. [6]

    and Zeitouni, O

    Dembo, A. and Zeitouni, O. (2009). Large Deviations Techniques and Applications , volume 38. Springer Science & Business Media

  7. [7]

    Dong, A., Chen, W.-N., and Ozgur, A. (2025). Leveraging randomness in model and data partitioning for privacy amplification. In International Conference on Machine Learning , pages 13938--13962. PMLR

  8. [8]

    and Ganesh, A

    Dong, A. and Ganesh, A. (2026). Privacy amplification for bandmf via b -min-sep subsampling. arXiv preprint arXiv:2602.09338

  9. [9]

    Dong, J., Roth, A., and Su, W. J. (2022). Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(1):3--37

  10. [10]

    Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. (2022). Connect the dots: Tighter discrete approximations of privacy loss distributions. Proceedings on Privacy Enhancing Technologies , 2022(4):552--570

  11. [11]

    Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference , pages 265--284. Springer

  12. [12]

    and Shenfeld, M

    Feldman, V. and Shenfeld, M. (2025). Privacy amplification by random allocation. arXiv preprint arXiv:2502.08202

  13. [13]

    and Shenfeld, M

    Feldman, V. and Shenfeld, M. (2026). Efficient privacy loss accounting for subsampling and random allocation. arXiv preprint arXiv:2602.17284

  14. [14]

    Ganesh, A. (2025). Tighter privacy analysis for truncated poisson sampling. arXiv preprint arXiv:2508.15089

  15. [15]

    and Massart, P

    Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Annals of statistics , pages 1302--1338

  16. [16]

    Li, X., Tramer, F., Liang, P., and Hashimoto, T. (2021). Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679

  17. [17]

    A., Ghazi, B., Kaissis, G., Kumar, R., Liu, R., et al

    McKenna, R., Huang, Y., Sinha, A., Balle, B., Charles, Z., Choquette-Choo, C. A., Ghazi, B., Kaissis, G., Kumar, R., Liu, R., et al. (2025). Scaling laws for differentially private language models. arXiv preprint arXiv:2501.18914

  18. [18]

    Mironov, I. (2017). R \'e nyi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF) , pages 263--275. IEEE

  19. [19]

    B., Vassilvitskii, S., Chien, S., and Thakurta, A

    Ponomareva, N., Hazimeh, H., Kurakin, A., Xu, Z., Denison, C., McMahan, H. B., Vassilvitskii, S., Chien, S., and Thakurta, A. G. (2023). How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research , 77:1113--1201

  20. [20]

    Sander, T., Stock, P., and Sablayrolles, A. (2023). Tan without a burn: Scaling laws of dp-sgd. In International Conference on Machine Learning , pages 29937--29949. PMLR

  21. [21]

    Shenfeld, M. (2026). Pld accounting for subsampling and random allocation. https://github.com/moshenfeld/PLD_accounting. Commit b51429d, accessed 2026-03-18

  22. [22]

    T., Mahloujifar, S., Wu, T., Jia, R., and Mittal, P

    Wang, J. T., Mahloujifar, S., Wu, T., Jia, R., and Mittal, P. (2023). A randomized approach to tight privacy accounting. Advances in Neural Information Processing Systems , 36:33856--33893

  23. [23]

    Wang, Y.-X., Balle, B., and Kasiviswanathan, S. P. (2019). Subsampled r \'e nyi differential privacy and analytical moments accountant. In The 22nd international conference on artificial intelligence and statistics , pages 1226--1235. PMLR