Recognition: unknown
Trade-off Functions for DP-SGD with Subsampling based on Random Shuffling: Tight Upper and Lower Bounds
Pith reviewed 2026-05-08 13:18 UTC · model grok-4.3
The pith
DP-SGD with random shuffling subsampling yields tight closed-form trade-off function bounds that converge to the ideal 1-a diagonal under suitable epoch scaling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the regime sigma greater than or equal to the square root of three over ln M, the trade-off function for shuffling-based DP-SGD satisfies tight closed-form bounds up to constant factors derived from the Berry-Esseen theorem applied to the sum of bounded random variables from the shuffling process; moreover, when E equals c_M squared times M with c_M approaching zero, the E-fold composition of the trade-off function converges uniformly to 1 minus a in a, with delta having only an O of square root of E dependency.
What carries the argument
The trade-off function in the f-DP framework for shuffling-based subsampling, with its distribution characterized via the Berry-Esseen theorem on the privacy-loss random variable.
If this is right
- For one epoch with sigma equals 1 and delta equals 1/100, roughly 1.14 million rounds and 11.4 million samples suffice to reach a trade-off function at least 1 minus a minus delta.
- Composition over E epochs produces a delta that grows linearly with E, which restricts usable E to order square root of M to keep the bound meaningful.
- Under the scaling E equals c_M squared times M with c_M to zero, the delta dependency improves to O of square root of E while still converging to the ideal diagonal.
- The explicit bounds stand in contrast to negative results known for the smaller-sigma regime sigma less than or equal to one over square root of two ln M.
Where Pith is reading between the lines
- Shuffling may be preferable to Poisson subsampling in practice when interpretable closed-form privacy accounting is needed for large training runs.
- The open question of explicit convergence rates left in the paper points to a natural next step of replacing Berry-Esseen with sharper concentration tools.
- The uniform convergence result suggests that privacy loss can be made arbitrarily close to random guessing by modest growth in rounds relative to epochs, which could guide hyper-parameter choices in private deep learning.
- The same shuffling analysis may extend to other iterative private optimization methods where the per-step randomness is permutation-based rather than independent.
Load-bearing premise
The noise multiplier must be at least the square root of three over the log of the number of rounds, and the Berry-Esseen theorem must give a sufficiently accurate approximation to the distribution of the summed privacy losses from shuffling.
What would settle it
Numerical computation of the exact privacy-loss distribution for shuffling DP-SGD at sigma equals 1 and M around one million showing the realized trade-off function deviates by more than a small constant from the claimed closed-form bounds or fails to approach 1 minus a uniformly under the stated epoch scaling.
Figures
read the original abstract
We derive a tight analysis of the trade-off function for Differentially Private Stochastic Gradient Descent (DP-SGD) with subsampling based on random shuffling within the $f$-DP framework. Our analysis covers the regime $\sigma \geq \sqrt{3/\ln M}$, where $\sigma$ is the noise multiplier and $M$ is the number of rounds within a single epoch. Unlike $f$-DP analyses for Poisson subsampling, which yield non-closed implicit formulas that can be machine computed but are non-transparent, random shuffling admits a tight analysis yielding transparent and interpretable closed-form bounds. Our concrete bounds, derived via the Berry-Esseen theorem, are tight up to constant factors within the proof framework. We demonstrate worked parameter settings for a single epoch ($E=1$) with a corresponding trade-off function $\geq 1-a-\delta$, that is, only $\delta$ below the ideal random guessing diagonal $1-a$: For $\delta = 1/100$ and $\sigma = 1$, roughly $M \approx 1.14\times 10^6$ rounds and $N \approx 1.14\times 10^7$ training samples suffice to achieve meaningful differential privacy. This is in contrast to recent negative results for the regime $\sigma \leq 1/\sqrt{2 \ln M}$. Our concrete bounds can be composed over multiple epochs leading to $\delta$ having a linear in $E$ dependency, which restricts $E=O(\sqrt{M})$. To go beyond Berry--Esseen, we introduce a new proof technique based on a generalization of the law of large numbers that yields an asymptotic random guessing diagonal-limit result: if $E=c_M^2M$ with $c_M\to 0$, then the $E$-fold composed trade-off function satisfies $f^{\otimes E}(a)\to 1-a$ uniformly in $a\in[0,1]$ with $\delta$ having only an $O(\sqrt{E})$ dependency. We compare this asymptotic regime with the corresponding Poisson subsampling asymptotic, and highlight the characterization of explicit convergence rates as an open question.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives tight upper and lower bounds on the trade-off functions for differentially private stochastic gradient descent (DP-SGD) employing subsampling via random shuffling, framed within the f-DP paradigm. The analysis is restricted to the noise regime σ ≥ √(3/ln M) and leverages the Berry-Esseen theorem to obtain closed-form expressions that are claimed to be tight up to constant factors. Concrete numerical instantiations are provided for a single epoch, and an asymptotic result is established for the composition over multiple epochs using a novel generalization of the law of large numbers, showing convergence to the ideal random-guessing trade-off under suitable scaling of the number of epochs E with M.
Significance. Should the central claims hold, the work would be significant for providing interpretable, closed-form privacy analyses for random shuffling, a common practical mechanism in DP-SGD implementations, in contrast to the typically implicit and numerically computed bounds for Poisson subsampling. The explicit parameter settings (e.g., achieving δ = 0.01 with σ = 1 and large but feasible M and N) and the asymptotic uniform convergence result offer practical guidance and theoretical insight into when shuffling-based DP-SGD can approach ideal privacy guarantees. The introduction of a new concentration technique beyond Berry-Esseen is a methodological contribution.
major comments (2)
- [Main bound derivation / Berry-Esseen application] The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.
- [Asymptotic analysis] Asymptotic result (in the section on the new proof technique): The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.
minor comments (3)
- [Preliminaries / f-DP framework] Clarify the exact definition of the trade-off function f and how the lower and upper bounds are established symmetrically.
- [Discussion / Comparison section] In the comparison with Poisson subsampling, include a brief discussion of computational aspects or when one might prefer shuffling despite the regime restriction.
- [Numerical examples] The manuscript mentions 'worked parameter settings'; consider adding a table summarizing the parameters for different δ values.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the work's significance and for the constructive feedback on the technical details of the proofs. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: The proof invoking the Berry-Esseen theorem (in the section deriving the closed-form bounds): The cumulative privacy loss is modeled as a sum of bounded but dependent random variables due to the random shuffling (without replacement) process. Standard Berry-Esseen theorems apply to independent summands; the manuscript should explicitly state the version of Berry-Esseen employed (or any extension for exchangeable/dependent variables) and confirm that the approximation error bound C β3 / (σ³ √M) remains valid and does not exceed the claimed tightness in the regime σ ≥ √(3/ln M). This underpins the concrete bounds and the numerical example with M ≈ 1.14×10^6.
Authors: We appreciate the referee highlighting this point for improved rigor. The privacy loss variables under random shuffling form an exchangeable sequence, and our derivation applies a Berry-Esseen-type bound valid for exchangeable (dependent) random variables. We will revise the manuscript to explicitly cite the specific version of the theorem employed for exchangeable sums and add a short verification that the error term C β3 / (σ³ √M) remains o(1) and does not compromise the claimed tightness (up to constants) in the regime σ ≥ √(3/ln M). This addresses the concern directly while preserving all stated results. revision: yes
-
Referee: The claim that if E = c_M² M with c_M → 0 then f⊗E(a) → 1-a uniformly in a ∈ [0,1] with only O(√E) dependency on δ relies on the new law-of-large-numbers generalization. Please provide more details on the convergence rate and how the error terms compose over E epochs to ensure the uniform convergence holds as stated.
Authors: We thank the referee for requesting elaboration on the novel technique. The generalized law of large numbers bounds the per-epoch deviation of the average privacy loss by O(1/√M) uniformly. Since epochs are independent, the errors compose additively in the exponentiated sense, yielding a total deviation of order √E / √M for the E-fold composition. Substituting E = c_M² M with c_M → 0 makes this o(1) uniformly over a ∈ [0,1], giving the stated convergence with O(√E) δ-dependence. We will expand the relevant section and add an appendix with the full proof sketch of the generalized LLN, explicit rates, and error composition to make this fully transparent. revision: yes
Circularity Check
No circularity: bounds derived from external theorems
full rationale
The derivation applies the Berry-Esseen theorem to obtain finite-M closed-form bounds on the trade-off function and invokes a generalization of the law of large numbers for the asymptotic uniform convergence result. Both are independent external results whose statements do not reference the target f-DP trade-off function or the paper's own parameters. No self-definitional equations, no fitted inputs relabeled as predictions, and no load-bearing self-citations appear. The central claims therefore remain non-tautological with respect to the paper's inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Berry-Esseen theorem applies to the sum of bounded random variables arising from the shuffling process
Reference graph
Works this paper leans on
-
[1]
Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Edgar R. Weippl, Stefan Katzenbeisser, Christopher Kruegel, Andrew C. Myers, and Shai Halevi, editors,Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, Oc...
-
[2]
Privacy amplification by subsampling: Tight analyses via couplings and divergences
Borja Balle, Gilles Barthe, and Marco Gaboardi. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors,Advances in Neural Information Processing Systems 31: Annual Conference on Neural Informa- tion Processing Sy...
2018
-
[3]
Differentially private stochastic gradient descent with fixed-size minibatches: Tighter RDP guarantees with or without replacement
Jeremiah Birrell, Reza Ebrahimi, Rouzbeh Behnia, and Jason Pacheco. Differentially private stochastic gradient descent with fixed-size minibatches: Tighter RDP guarantees with or without replacement. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Pr...
2024
- [4]
-
[5]
Concentrated differential privacy: Simplifications, extensions, and lower bounds
Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Martin Hirt and Adam D. Smith, editors,Theory of Cryptography - 14th International Conference, TCC 2016-B, Beijing, China, October 31 - November 3, 2016, Proceedings, Part I, Lecture Notes in Computer Science, pages 635–658, 2016. doi: 10.1007...
-
[6]
How private are DP-SGD implementations? In Ruslan Salakhutdinov, Zico Kolter, Katherine A
Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. How private are DP-SGD implementations? In Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Au...
2024
-
[7]
URL https://proceedings.mlr.press/v235/ chua24a.html
PMLR / OpenReview.net, 2024. URL https://proceedings.mlr.press/v235/ chua24a.html
2024
-
[8]
Scalable DP-SGD: shuffling vs
Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. Scalable DP-SGD: shuffling vs. poisson subsampling. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neura...
2024
-
[9]
Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian differential privacy.Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37, 02 2022. ISSN 1369-7412. doi: 10.1111/rssb.12454. URLhttps://doi.org/10.1111/rssb.12454. 10
-
[10]
Connect the dots: Tighter discrete approximations of privacy loss distributions.Proc
Vadym Doroshenko, Badih Ghazi, Pritish Kamath, Ravi Kumar, and Pasin Manurangsi. Connect the dots: Tighter discrete approximations of privacy loss distributions.Proc. Priv. Enhancing Technol., 2022(4):552–570, 2022. doi: 10.56553/POPETS-2022-0122. URL https://doi. org/10.56553/popets-2022-0122
-
[11]
The algorithmic foundations of differential privacy
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy.Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014. doi: 10.1561/0400000042. URL https: //doi.org/10.1561/0400000042
-
[12]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors,Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, Lecture Notes in Computer Science, pages 265–284. Springer, 2006. doi: 1...
-
[13]
Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD
Murat Bilgehan Ertan and Marten van Dijk. Fundamental limitations of favorable privacy-utility guarantees for DP-SGD. volume abs/2601.10237, 2026. doi: 10.48550/ARXIV .2601.10237. URL https://doi.org/10.48550/arXiv.2601.10237. E-print states that the paper has been accepted at ACM CCS 2026
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2026
-
[14]
C. G. Esseen. A moment inequality with an application to the central limit theorem.Scandina- vian Actuarial Journal, 1956(2):160–170, 1956. doi: 10.1080/03461238.1956.10414946. URL https://doi.org/10.1080/03461238.1956.10414946
-
[15]
John Wiley & Sons, 2nd edition, 1971
William Feller.An Introduction to Probability Theory and Its Applications, volume II. John Wiley & Sons, 2nd edition, 1971
1971
-
[16]
Numerical composition of differential privacy
Sivakanth Gopi, Yin Tat Lee, and Lukas Wutschitz. Numerical composition of differential privacy. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors,Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, ...
2021
-
[17]
The composition theorem for differential privacy
Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In Francis R. Bach and David M. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR Workshop and Conference Proceedings, pages 1376–1385. JMLR.org, 2015. URLhttp://proceedings. ml...
2015
-
[18]
Computing tight differential privacy guarantees using FFT
Antti Koskela, Joonas Jälkö, and Antti Honkela. Computing tight differential privacy guarantees using FFT. In Silvia Chiappa and Roberto Calandra, editors,The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], Proceedings of Machine Learning Research, pages 2560–2569. ...
2020
-
[19]
Communication-efficient learning of deep networks from decentralized data
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Aarti Singh and Xiaojin (Jerry) Zhu, editors,Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA...
2017
-
[20]
Ilya Mironov. Rényi differential privacy. pages 263–275, 2017. doi: 10.1109/CSF.2017.11. URLhttps://doi.org/10.1109/CSF.2017.11
- [21]
-
[22]
Random reshuffling: Simple analysis with vast improvements
Konstantin Mishchenko, Ahmed Khaled, and Peter Richtárik. Random reshuffling: Simple analysis with vast improvements. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 11 202...
2020
-
[23]
Irina Shevtsova. On the absolute constants in the berry-esseen type inequalities for identically distributed summands. 2011. URLhttps://arxiv.org/abs/1111.6554
-
[24]
The normal d.f
Alan Stuart and J. Keith Ord."The normal d.f." Kendall’s Advanced Theory of Statistics. Vol. 1: Distribution Theory. originally by Maurice Kendall (5th ed.). Charles Griffin & Co., 1987
1987
-
[25]
Subsampled renyi differential privacy and analytical moments accountant
Yu-Xiang Wang, Borja Balle, and Shiva Prasad Kasiviswanathan. Subsampled renyi differential privacy and analytical moments accountant. In Kamalika Chaudhuri and Masashi Sugiyama, editors,The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, Proceedings of Machine Learning Researc...
2019
-
[26]
Since the other points on the trade-off curve are at least this close to the random guessing line, we have that for all a∈[0,1] , f(a)≥1−a−δ . The remainder of the lemma follows from the definition of f0,δ and the known E-fold composition of f0,δ (see Section 3.3 in [8]).□ 17 D.1 False Positive Rate We are ready to analyze α(h) of (7) with the probability...
-
[27]
21 From this we obtain the lower bound s√ lnM ·ln(M/2)− √ lnM 2s
We derive σln(M(1−κ· r e1/σ2 −1 M ))− 1 2σ = s√ lnM ·ln(M−κ·M 1/2 p M1/s2 −1)− √ lnM 2s ≥ s√ lnM ·ln(M−κ·M (1+1/s2)/2)− √ lnM 2s .(31) Notice that ifM≥8·κ 3, then κ·M (1+1/s2)/2 ≤κ·M 2/3 ≤M/2. 21 From this we obtain the lower bound s√ lnM ·ln(M/2)− √ lnM 2s . By noticing that this lower bound is increasing ins, and usings≥ √ 3, we have that (31) is at lea...
-
[28]
This shows that forMlarge enoughp M(κ′ M) =κ ′ M −κ ′′ M ≥ √ 2 lnM
=O( lnM√ M ). This shows that forMlarge enoughp M(κ′ M) =κ ′ M −κ ′′ M ≥ √ 2 lnM. We apply the upper bound Φ(−t)≤e −t2/2/( √ 2π·t)and obtain Φ(−pM(κ′ M))≤Φ(− √ 2 lnM)≤ e−lnM √ 4πlnM =M −1/ √ 4πlnM=o(1/M). Hence, forα −1(a)we restrict ourselves to the range a∈[M −1/ √ 4πlnM+|ϵ M |,1−M −1/ √ 4πlnM− |ϵ M |]. And for sucha,α −1(a)is in the range (50) withκ M ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.