Recognition: no theorem link
Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?
Pith reviewed 2026-05-11 00:59 UTC · model grok-4.3
The pith
Balanced Iteration Subsampling achieves stronger privacy amplification than Poisson subsampling in DP-SGD.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that Balanced Iteration Subsampling (BIS) achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum (σ → 0 and σ → ∞). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations.
What carries the argument
Balanced Iteration Subsampling (BIS), a deterministic scheme that assigns each sample to a fixed number of iterations while maintaining the same marginal probability of inclusion per iteration.
If this is right
- BIS reduces the noise multiplier needed by up to 9.6 percent in more than 60 tested DP-SGD configurations.
- The improvement appears most clearly in low-noise regimes that matter for high-utility private training.
- A near-exact Monte Carlo accountant removes the looseness of prior RDP and PLD composition bounds for BIS.
Where Pith is reading between the lines
- Frameworks could replace random sampling with BIS without changing the overall training loop structure.
- Similar fixed-count participation rules might improve privacy-utility tradeoffs in other iterative private algorithms.
Load-bearing premise
The privacy amplification advantage holds when participation variance is the dominant factor controlling the privacy-noise tradeoff.
What would settle it
An empirical or analytical comparison in which Poisson subsampling requires strictly less noise than BIS to reach the same privacy level at an intermediate noise multiplier would refute the optimality claim.
read the original abstract
Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that Balanced Iteration Subsampling (BIS), a structured scheme in which each sample participates in exactly a fixed number of iterations, achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum ($\sigma \to 0$ and $\sigma \to \infty$). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations. To translate this asymptotic theory into finite-noise guarantees, we introduce a practical near-exact Monte Carlo accountant for BIS, which removes the analytical slack of existing RDP and composition-based PLD analyses. Evaluations across more than 60 practical DP-SGD configurations show that BIS consistently outperforms Poisson subsampling in the low-noise regimes most relevant for high-utility private training, reducing the required noise multiplier by up to $9.6\%$. These results overturn the common intuition that more sampling randomness necessarily yields stronger privacy amplification: in DP-SGD, structured participation can be both more practical and more private. Our implementation is available at https://github.com/dong-xin-ao-andy/bis-mc-accountant.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Balanced Iteration Subsampling (BIS), a structured subsampling scheme for DP-SGD in which each training example participates in a fixed number of iterations. It claims to prove that BIS yields strictly stronger privacy amplification than Poisson subsampling for finite noise multipliers and is optimal at both σ → 0 and σ → ∞. The authors replace standard RDP/PLD composition with a near-exact Monte Carlo accountant for BIS, report up to 9.6% noise-multiplier reduction across more than 60 DP-SGD configurations, and release an implementation.
Significance. If the asymptotic optimality proofs hold and the Monte Carlo accountant can be shown to produce conservative upper bounds, the result would challenge the default status of Poisson subsampling and could improve the privacy-utility frontier for high-utility DP training. The open-source release of the accountant is a clear strength for reproducibility.
major comments (1)
- [Monte Carlo accountant section] The Monte Carlo accountant (described in the section on finite-noise guarantees) estimates the privacy-loss distribution by sampling participation patterns and noise realizations but provides no one-sided concentration inequality, explicit additive margin, or proven conservative bound guaranteeing that the reported (ε, δ) values upper-bound the true privacy loss. Because the 9.6% noise-reduction figures and the finite-σ superiority claim rest directly on these estimates, the absence of such a guarantee is load-bearing for the central practical contribution.
minor comments (2)
- [Abstract] The abstract states “more than 60 practical DP-SGD configurations” without specifying the exact count, the range of dataset sizes, model architectures, or the precise criteria used to select or exclude configurations.
- [Introduction / BIS definition] Notation for the fixed participation count in BIS is introduced without an explicit equation reference in the early sections; adding a numbered definition would improve readability.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The point raised about the Monte Carlo accountant is well-taken and directly affects the strength of our finite-noise claims. We address it below and will revise the manuscript to incorporate a formal conservative guarantee.
read point-by-point responses
-
Referee: [Monte Carlo accountant section] The Monte Carlo accountant (described in the section on finite-noise guarantees) estimates the privacy-loss distribution by sampling participation patterns and noise realizations but provides no one-sided concentration inequality, explicit additive margin, or proven conservative bound guaranteeing that the reported (ε, δ) values upper-bound the true privacy loss. Because the 9.6% noise-reduction figures and the finite-σ superiority claim rest directly on these estimates, the absence of such a guarantee is load-bearing for the central practical contribution.
Authors: We agree that the absence of a proven conservative bound is a substantive limitation for the practical claims. The current manuscript describes the accountant as 'near-exact' on the basis of large-scale sampling but does not supply a one-sided concentration inequality or explicit margin. In the revised manuscript we will add a dedicated subsection deriving such a bound. Concretely, we will apply a Hoeffding-type inequality to the empirical (1-δ)-quantile of the sampled privacy-loss random variable, yielding an additive margin that guarantees, with probability at least 1-η, that the reported ε is an upper bound on the true privacy loss. We will state the sample size, failure probability η, and resulting margin for every reported configuration, thereby making the 9.6 % noise-multiplier reductions and finite-σ comparisons formally rigorous. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper derives its main result via a mathematical proof that Balanced Iteration Subsampling (BIS) yields stronger privacy amplification than Poisson subsampling and is optimal at the noise extremes σ→0 and σ→∞, together with a separate Monte Carlo accountant for finite-σ guarantees. No equations, definitions, or claims in the abstract or described analysis reduce the optimality statement or the reported noise-reduction figures to a fitted parameter, a self-referential definition, or a load-bearing self-citation whose content is itself unverified. The governing insight—that participation variance, not randomness per se, limits amplification—is presented as the output of the new analysis rather than an input assumption. Empirical evaluations are reported as corroboration, not as the source of the theoretical claims. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Privacy amplification in subsampled DP-SGD is governed by participation variance and marginal uniformity.
invented entities (1)
-
Balanced Iteration Subsampling (BIS)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
B., Mironov, I., Talwar, K., and Zhang, L
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security , pages 308--318
2016
-
[2]
Balle, B., Berrada, L., Charles, Z., Choquette-Choo, C. A., De, S., Doroshenko, V., Dvijotham, D., Galen, A., Ganesh, A., Ghalebikesabi, S., Hayes, J., Kairouz, P., McKenna, R., McMahan, B., Mitchell, N., Pappu, A., Ponomareva, N., Pravilov, M., Rush, K., Smith, S. L., and Stanforth, R. (2022). JAX - P rivacy: Algorithms for privacy-preserving machine lea...
2022
-
[3]
A., Ganesh, A., Steinke, T., and Thakurta, A
Choquette-Choo, C. A., Ganesh, A., Steinke, T., and Thakurta, A. (2023). Privacy amplification for matrix mechanisms. arXiv preprint arXiv:2310.15526
- [4]
-
[5]
arXiv preprint arXiv:2204.13650 , year=
De, S., Berrada, L., Hayes, J., Smith, S. L., and Balle, B. (2022). Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650
-
[6]
and Zeitouni, O
Dembo, A. and Zeitouni, O. (2009). Large Deviations Techniques and Applications , volume 38. Springer Science & Business Media
2009
-
[7]
Dong, A., Chen, W.-N., and Ozgur, A. (2025). Leveraging randomness in model and data partitioning for privacy amplification. In International Conference on Machine Learning , pages 13938--13962. PMLR
2025
-
[8]
Dong, A. and Ganesh, A. (2026). Privacy amplification for bandmf via b -min-sep subsampling. arXiv preprint arXiv:2602.09338
-
[9]
Dong, J., Roth, A., and Su, W. J. (2022). Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(1):3--37
2022
-
[10]
Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. (2022). Connect the dots: Tighter discrete approximations of privacy loss distributions. Proceedings on Privacy Enhancing Technologies , 2022(4):552--570
2022
-
[11]
Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference , pages 265--284. Springer
2006
-
[12]
Feldman, V. and Shenfeld, M. (2025). Privacy amplification by random allocation. arXiv preprint arXiv:2502.08202
-
[13]
Feldman, V. and Shenfeld, M. (2026). Efficient privacy loss accounting for subsampling and random allocation. arXiv preprint arXiv:2602.17284
- [14]
-
[15]
and Massart, P
Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Annals of statistics , pages 1302--1338
2000
- [16]
-
[17]
A., Ghazi, B., Kaissis, G., Kumar, R., Liu, R., et al
McKenna, R., Huang, Y., Sinha, A., Balle, B., Charles, Z., Choquette-Choo, C. A., Ghazi, B., Kaissis, G., Kumar, R., Liu, R., et al. (2025). Scaling laws for differentially private language models. arXiv preprint arXiv:2501.18914
-
[18]
Mironov, I. (2017). R \'e nyi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF) , pages 263--275. IEEE
2017
-
[19]
B., Vassilvitskii, S., Chien, S., and Thakurta, A
Ponomareva, N., Hazimeh, H., Kurakin, A., Xu, Z., Denison, C., McMahan, H. B., Vassilvitskii, S., Chien, S., and Thakurta, A. G. (2023). How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research , 77:1113--1201
2023
-
[20]
Sander, T., Stock, P., and Sablayrolles, A. (2023). Tan without a burn: Scaling laws of dp-sgd. In International Conference on Machine Learning , pages 29937--29949. PMLR
2023
-
[21]
Shenfeld, M. (2026). Pld accounting for subsampling and random allocation. https://github.com/moshenfeld/PLD_accounting. Commit b51429d, accessed 2026-03-18
2026
-
[22]
T., Mahloujifar, S., Wu, T., Jia, R., and Mittal, P
Wang, J. T., Mahloujifar, S., Wu, T., Jia, R., and Mittal, P. (2023). A randomized approach to tight privacy accounting. Advances in Neural Information Processing Systems , 36:33856--33893
2023
-
[23]
Wang, Y.-X., Balle, B., and Kasiviswanathan, S. P. (2019). Subsampled r \'e nyi differential privacy and analytical moments accountant. In The 22nd international conference on artificial intelligence and statistics , pages 1226--1235. PMLR
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.