Recognition: no theorem link
Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise
Pith reviewed 2026-05-14 21:34 UTC · model grok-4.3
The pith
Kolmogorov-Arnold Networks receive population risk bounds under mini-batch DP-SGD with correlated noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We establish the first population risk bounds for KANs trained by mini-batch SGD with gradient clipping, covering non-private SGD as well as DP-SGD with Gaussian perturbations that interpolate between independent and temporally correlated noise. The results cover prior full-batch GD and independent-noise DP-GD for KANs while giving sharper bounds when the second layer is fixed. The technical core is a new analysis route using an auxiliary unprojected dynamics, a shifted iterate absorbing noise, and a high-probability bootstrap certifying projection inactivity to handle temporal dependence and projection in the correlated noise case.
What carries the argument
An auxiliary unprojected dynamics together with a shifted iterate that absorbs the current noise perturbation and a high-probability bootstrap that certifies the projection step remains inactive.
If this is right
- The bounds apply directly to mini-batch training used in practice.
- They cover DP-SGD mechanisms with temporally correlated Gaussian noise.
- Sharper specializations exist for KANs with a fixed second layer.
- The analysis extends to cover the corresponding full-batch cases as well.
- These are the first such bounds beyond convex learning for correlated-noise DP training of neural networks.
Where Pith is reading between the lines
- The technique for handling correlated noise could apply to other non-convex neural network training under DP.
- Empirical validation could involve checking if the projection remains inactive in typical KAN training runs.
- Practitioners might use these bounds to select noise correlation levels that balance privacy and accuracy.
- The results suggest that correlated noise does not necessarily worsen the theoretical guarantees compared to independent noise.
Load-bearing premise
The high-probability bootstrap must successfully certify that the projection step is inactive so that the shifted iterate can absorb the noise without interference from clipping.
What would settle it
Training runs of KANs under the correlated noise model where the projection step activates frequently enough to violate the population risk bounds derived in the analysis.
Figures
read the original abstract
We establish the first population risk bounds for Kolmogorov-Arnold Networks (KANs) trained by mini-batch SGD with gradient clipping, covering non-private SGD as well as differentially private SGD (DP-SGD) with Gaussian perturbations that interpolate between independent and temporally correlated noise. This setting is substantially closer to practice than prior KAN theory along two axes: training is by mini-batch SGD, the standard recipe for modern networks, rather than full-batch gradient descent (GD); and correlated-noise mechanisms have empirically shown a more favorable privacy-utility tradeoff than independent-noise mechanisms. Our results cover the corresponding full-batch GD and independent-noise DP-GD results for KANs by Wang et al. (2026), while yielding sharper fixed-second-layer specializations. The technical core is a new analysis route for correlated-noise DP training in the non-convex regime. Temporal dependence breaks the conditional-centering structure underlying standard one-step SGD arguments, and the projection step obstructs the exact cancellation structure of correlated perturbations. We address these difficulties through an auxiliary unprojected dynamics, a shifted iterate that absorbs the current noise perturbation, and a high-probability bootstrap certifying projection inactivity. Combining this optimization analysis with a stability-based generalization argument yields the stated population risk bounds. To the best of our knowledge, this is the first optimization and population risk analysis of a correlated-noise mechanism for DP training beyond convex learning, in particular for neural networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish the first population risk bounds for Kolmogorov-Arnold Networks (KANs) trained by mini-batch SGD with gradient clipping. The bounds cover both non-private SGD and DP-SGD with Gaussian perturbations that interpolate between independent and temporally correlated noise. The analysis uses an auxiliary unprojected dynamics, a shifted iterate absorbing noise, and a high-probability bootstrap to certify projection inactivity, combined with a stability-based generalization argument. This extends prior full-batch GD and independent-noise results for KANs while yielding sharper specializations for fixed second layers.
Significance. If the central claims hold, the work would be significant for providing the first optimization and population-risk analysis of correlated-noise DP mechanisms beyond convex learning, specifically for neural networks. It addresses a setting substantially closer to practice than prior KAN theory by handling mini-batch SGD and correlated noise, which empirically improves privacy-utility tradeoffs. The manuscript ships a coherent new analysis route with explicit coverage of non-convex regimes.
major comments (1)
- [Technical core (auxiliary unprojected dynamics, shifted iterate, and bootstrap argument)] The high-probability bootstrap certifying projection inactivity (described in the technical core) may fail to control clipping effects under temporally correlated noise. In non-convex KAN landscapes with mini-batching, increased correlation can raise the likelihood that gradient norms exceed the clip threshold, breaking the one-step SGD cancellation structure and preventing the population risk bounds from holding.
minor comments (1)
- The abstract states that results cover and sharpen prior work by Wang et al. (2026) but does not quantify the improvement in the fixed-second-layer specialization or state the precise assumptions on the KAN architecture and noise correlation parameter.
Simulated Author's Rebuttal
We thank the referee for their careful reading, positive assessment of the significance, and for raising this technical point on the bootstrap argument. We address it directly below.
read point-by-point responses
-
Referee: The high-probability bootstrap certifying projection inactivity (described in the technical core) may fail to control clipping effects under temporally correlated noise. In non-convex KAN landscapes with mini-batching, increased correlation can raise the likelihood that gradient norms exceed the clip threshold, breaking the one-step SGD cancellation structure and preventing the population risk bounds from holding.
Authors: We appreciate the concern. The analysis is designed to handle precisely this issue: the auxiliary unprojected dynamics together with the shifted iterate (which absorbs the current Gaussian perturbation) restore a conditional centering property even under temporal correlation. The high-probability bootstrap then certifies projection inactivity via a union-bound argument whose failure probability is controlled uniformly in the correlation parameter by the sub-Gaussian tails of the noise; the mini-batch variance is absorbed into the same concentration bound under the KAN Lipschitz and smoothness assumptions stated in the paper. Consequently the one-step cancellation structure is preserved on the high-probability event, and the population-risk bounds continue to hold. We have added a clarifying paragraph in Section 3.2 and a supporting lemma (Lemma 4.3) in the appendix that makes the uniform control explicit. revision: partial
Circularity Check
No significant circularity; auxiliary dynamics and bootstrap are independent of inputs
full rationale
The paper's core derivation introduces an auxiliary unprojected dynamics, a shifted iterate absorbing noise, and a high-probability bootstrap for projection inactivity to handle correlated noise in non-convex KAN training. These constructs are presented as new technical tools that restore cancellation and control clipping effects without reducing to fitted parameters or prior results by construction. The self-citation to Wang et al. (2026) only recovers prior full-batch and independent-noise cases as specializations, while the correlated-noise population risk bounds rely on the new stability-based generalization argument applied to the fresh optimization analysis. No equation or step equates a prediction to its own input or imports uniqueness via self-citation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions for non-convex SGD analysis and DP mechanisms (e.g., bounded gradients, Lipschitz continuity)
Reference graph
Works this paper leans on
-
[1]
A convergence theory for deep learning via over- parameterization
Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via over- parameterization. InInternational Conference on Machine Learning, pages 242–252. PMLR, 2019
work page 2019
-
[2]
Joel Daniel Andersson and Rasmus Pagh. A smooth binary mechanism for efficient private continual observation.Advances in Neural Information Processing Systems, 36:49133–49145, 2023
work page 2023
-
[3]
The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing
Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Georgios Kaissis, and Emiliano De Cristofaro. The hitchhiker’s guide to efficient, end-to-end, and tight dp auditing.arXiv preprint arXiv:2506.16666, 2025
-
[4]
Sanjeev Arora, Simon Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. InInternational Conference on Machine Learning, pages 322–332. PMLR, 2019
work page 2019
-
[5]
Gilles Barthe and Federico Olmedo. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs. InInternational Colloquium on Automata, Languages, and Programming, pages 49–60. Springer, 2013
work page 2013
-
[6]
Spectrally-normalized margin bounds for neural networks
Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. InAdvances in Neural Information Processing Systems, volume 30, 2017
work page 2017
-
[7]
Private stochastic convex optimization with optimal rates
Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Guha Thakurta. Private stochastic convex optimization with optimal rates. InAdvances in Neural Information Processing Systems, volume 32, 2019
work page 2019
-
[8]
Generalization bounds of stochastic gradient descent for wide and deep neural networks
Yuan Cao and Quanquan Gu. Generalization bounds of stochastic gradient descent for wide and deep neural networks. InAdvances in Neural Information Processing Systems, volume 32, 2019
work page 2019
-
[9]
Zixiang Chen, Yuan Cao, Difan Zou, and Quanquan Gu. How much over-parameterization is sufficient to learn deep ReLU networks? InInternational Conference on Learning Representation, 2021
work page 2021
-
[10]
Kolmogorov–arnold networks for genomic tasks.Briefings in Bioinformatics, 26(2):bbaf129, 2025
Oleksandr Cherednichenko and Maria Poptsova. Kolmogorov–arnold networks for genomic tasks.Briefings in Bioinformatics, 26(2):bbaf129, 2025
work page 2025
-
[11]
Correlated noise provably beats independent noise for differen- tially private learning
Christopher A Choquette-Choo, Krishnamurthy Dj Dvijotham, Krishna Pillutla, Arun Ganesh, Thomas Steinke, and Abhradeep Guha Thakurta. Correlated noise provably beats independent noise for differen- tially private learning. InInternational Conference on Learning Representations, 2024
work page 2024
-
[12]
Near exact privacy amplification for matrix mechanisms.arXiv preprint arXiv:2410.06266, 2024
Christopher A Choquette-Choo, Arun Ganesh, Saminul Haque, Thomas Steinke, and Abhradeep Thakurta. Near exact privacy amplification for matrix mechanisms.arXiv preprint arXiv:2410.06266, 2024
-
[13]
(amplified) banded matrix factorization: A unified approach to private training
Christopher A Choquette-Choo, Arun Ganesh, Ryan McKenna, H Brendan McMahan, John Rush, Abhradeep Guha Thakurta, and Zheng Xu. (amplified) banded matrix factorization: A unified approach to private training. InAdvances in Neural Information Processing Systems, volume 36, pages 74856–74889, 2023
work page 2023
-
[14]
Choquette-Choo, Arun Ganesh, Thomas Steinke, and Abhradeep Guha Thakurta
Christopher A. Choquette-Choo, Arun Ganesh, Thomas Steinke, and Abhradeep Guha Thakurta. Privacy amplification for matrix mechanisms. InInternational Conference on Learning Representations, 2024. 11
work page 2024
-
[15]
Christopher A. Choquette-Choo, H. Brendan McMahan, Keith Rush, and Abhradeep Thakurta. Multi- epoch matrix factorization mechanisms for private machine learning. InInternational Conference on Machine Learning. JMLR.org, 2023
work page 2023
-
[16]
Improved differential privacy for SGD via optimal private linear operators on adaptive streams
Sergey Denisov, H Brendan McMahan, John Rush, Adam Smith, and Abhradeep Guha Thakurta. Improved differential privacy for SGD via optimal private linear operators on adaptive streams. In Advances in Neural Information Processing Systems, volume 35, pages 5910–5924, 2022
work page 2022
-
[17]
Understanding private learning from feature perspective.arXiv preprint arXiv:2511.18006, 2025
Meng Ding, Mingxi Lei, Shaopeng Fu, Shaowei Wang, Di Wang, and Jinhui Xu. Understanding private learning from feature perspective.arXiv preprint arXiv:2511.18006, 2025
-
[18]
Cynthia Dwork. Differential privacy. InInternational colloquium on automata, languages, and program- ming, 2006
work page 2006
-
[19]
Calibrating noise to sensitivity in private data analysis
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors,Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg
work page 2006
-
[20]
On the convergence of two-layer kolmogorov-arnold networks with first-layer training
Seyed Mohammad Eshtehardian, Mohammad Hossein Yassaee, and Babak Khalaj. On the convergence of two-layer kolmogorov-arnold networks with first-layer training. InInternational Conference on Learning Representations, 2026
work page 2026
-
[21]
Constant matters: Fine-grained error bound on differentially private continual observation
Hendrik Fichtenberger, Monika Henzinger, and Jalaj Upadhyay. Constant matters: Fine-grained error bound on differentially private continual observation. InInternational Conference on Machine Learning, pages 10072–10092, 2023
work page 2023
-
[22]
Spencer Frei, Niladri S Chatterji, and Peter L Bartlett. Random feature amplification: Feature learning and generalization in neural networks.Journal of Machine Learning Research, 24(303):1–49, 2023
work page 2023
-
[23]
Yihang Gao and Vincent YF Tan. On the convergence of (stochastic) gradient descent for Kolmogorov– Arnold networks.IEEE Transactions on Information Theory, 2025
work page 2025
-
[24]
Wassily Hoeffding. Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association, 58(301):13–30, 1963
work page 1963
-
[25]
Arthur Jacot, Franck Gabriel, and Cl´ ement Hongler. Neural tangent kernel: Convergence and generaliza- tion in neural networks.Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[26]
Ziwei Ji and Matus Telgarsky. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks. InInternational Conference on Learning Representations, 2020
work page 2020
-
[27]
Banded square root matrix factorization for differentially private model training
Nikita P Kalinin and Christoph Lampert. Banded square root matrix factorization for differentially private model training. InAdvances in Neural Information Processing Systems, volume 37, pages 17602–17655, 2024
work page 2024
-
[28]
DP-{\lambda}CGD: Efficient Noise Correlation for Differentially Private Model Training
Nikita P Kalinin, Ryan McKenna, Rasmus Pagh, and Christoph H Lampert. DP- λCGD: efficient noise correlation for differentially private model training, 2026. arXiv preprint arXiv:2601.22334
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[29]
Kalinin, Ryan McKenna, Jalaj Upadhyay, and Christoph H
Nikita P. Kalinin, Ryan McKenna, Jalaj Upadhyay, and Christoph H. Lampert. Back to square roots: An optimal bound on the matrix factorization error for multi-epoch differentially private SGD. In International Conference on Learning Representations, 2026
work page 2026
-
[30]
Gradient descent with linearly correlated noise: Theory and applications to differential privacy
Anastasiia Koloskova, Ryan McKenna, Zachary Charles, John Rush, and H Brendan McMahan. Gradient descent with linearly correlated noise: Theory and applications to differential privacy. InAdvances in Neural Information Processing Systems, volume 36, pages 35761–35773, 2023
work page 2023
-
[31]
Adaptive estimation of a quadratic functional by model selection
Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics, pages 1302–1338, 2000. 12
work page 2000
-
[32]
Stability and generalization analysis of gradient methods for shallow neural networks
Yunwen Lei, Rong Jin, and Yiming Ying. Stability and generalization analysis of gradient methods for shallow neural networks. InAdvances in Neural Information Processing Systems, volume 35, pages 38557–38570, 2022
work page 2022
-
[33]
Yunwen Lei, Puyu Wang, Yiming Ying, and Ding-Xuan Zhou. Optimization and generalization of gradient descent for shallow ReLU networks with minimal width.Journal of Machine Learning Research, 27(34):1–35, 2026
work page 2026
-
[34]
Fine-grained analysis of stability and generalization for stochastic gradient descent
Yunwen Lei and Yiming Ying. Fine-grained analysis of stability and generalization for stochastic gradient descent. InInternational Conference on Machine Learning, pages 5809–5819. PMLR, 2020
work page 2020
-
[35]
Longlong Li, Yipeng Zhang, Guanghui Wang, and Kelin Xia. Kolmogorov–arnold graph neural networks for molecular property prediction.Nature Machine Intelligence, 7(8):1346–1354, 2025
work page 2025
-
[36]
Pengqi Li, Lizhong Ding, Jiarun Fu, Guoren Wang, Ye Yuan, et al. Generalization bounds for kolmogorov- arnold networks (KANs) and enhanced KANs with lower lipschitz complexity. InAdvances in Neural Information Processing Systems, 2025
work page 2025
-
[37]
Optimal rates for generalization of gradient descent for deep ReLU classification
Yuanfan Li, Yunwen Lei, Zheng-Chu Guo, and Yiming Ying. Optimal rates for generalization of gradient descent for deep ReLU classification. InAdvances in Neural Information Processing Systems, 2026
work page 2026
-
[38]
Wei Liu, Eleni Chatzi, and Zhilu Lai. On the rate of convergence of kolmogorov-arnold network regression estimators.arXiv preprint arXiv:2509.19830, 2025
-
[39]
KAN: Kolmogorov-Arnold networks
Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljaˇ ci´ c, Thomas Y Hou, and Max Tegmark. KAN: Kolmogorov-Arnold networks. InInternational Conference on Learning Representations, 2025
work page 2025
-
[40]
Scaling up the banded matrix factorization mechanism for differentially private ML
Ryan McKenna. Scaling up the banded matrix factorization mechanism for differentially private ML. In International Conference on Learning Representation, 2025
work page 2025
-
[41]
A hassle-free algorithm for strong differential privacy in federated learning systems
Hugh Brendan McMahan, Zheng Xu, and Yanxiang Zhang. A hassle-free algorithm for strong differential privacy in federated learning systems. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 842–865, 2024
work page 2024
-
[42]
Ilya Mironov. R´ enyi differential privacy. In2017 IEEE 30th computer security foundations symposium (CSF), pages 263–275. IEEE, 2017
work page 2017
-
[43]
Mike Nguyen and Nicole Muecke. How many neurons do we need? a refined analysis for shallow networks trained with gradient descent.Journal of Statistical Planning and Inference, 233:106169, 2024
work page 2024
-
[44]
arXiv preprint arXiv:1905.09870 , year=
Atsushi Nitanda, Geoffrey Chinot, and Taiji Suzuki. Gradient descent can learn less over-parameterized two-layer neural networks on classification problems.arXiv preprint arXiv:1905.09870, 2019
-
[45]
Optimal rates for averaged stochastic gradient descent under neural tangent kernel regime
Atsushi Nitanda and Taiji Suzuki. Optimal rates for averaged stochastic gradient descent under neural tangent kernel regime. InInternational Conference on Learning Representations, 2021
work page 2021
-
[46]
Subhajit Patra, Sonali Panda, Bikram Keshari Parida, Mahima Arya, Kurt Jacobs, Denys I Bondar, and Abhijit Sen. Physics informed kolmogorov-arnold neural networks for dynamical analysis via efficient-kan and wav-kan.Journal of Machine Learning Research, 26(233):1–39, 2025
work page 2025
-
[47]
Correlated noise mechanisms for differentially private learning, 2025
Krishna Pillutla, Jalaj Upadhyay, Christopher A Choquette-Choo, Krishnamurthy Dvijotham, Arun Ganesh, Monika Henzinger, Jonathan Katz, Ryan McKenna, H Brendan McMahan, Keith Rush, et al. Correlated noise mechanisms for differentially private learning, 2025. arXiv preprint arXiv:2506.08201
-
[48]
Dominic Richards and Ilja Kuzborskij. Stability & generalisation of gradient descent for shallow neural networks without the neural tangent kernel. InAdvances in Neural Information Processing Systems, volume 34. PMLR, 2021. 13
work page 2021
-
[49]
Optimizing privacy-utility trade-off in decentralized learning with generalized correlated noise
Angelo Rodio, Zheng Chen, and Erik G Larsson. Optimizing privacy-utility trade-off in decentralized learning with generalized correlated noise. In2025 IEEE Information Theory Workshop (ITW), pages 1–6. IEEE, 2025
work page 2025
-
[50]
Sampling-free privacy accounting for matrix mechanisms under random allocation, 2026
Jan Schuchardt and Nikita Kalinin. Sampling-free privacy accounting for matrix mechanisms under random allocation, 2026
work page 2026
-
[51]
Towards understanding generalization in DP-GD: A case study in training two-layer CNNs
Zhongjie Shi, Puyu Wang, Chenyang Zhang, and Yuan Cao. Towards understanding generalization in DP-GD: A case study in training two-layer CNNs. InAAAI Conference on Artificial Intelligence, 2026
work page 2026
-
[52]
Khemraj Shukla, Juan Diego Toscano, Zhicheng Wang, Zongren Zou, and George Em Karniadakis. A comprehensive and fair comparison between mlp and kan representations for differential equations and operator networks.Computer Methods in Applied Mechanics and Engineering, 431:117290, 2024
work page 2024
-
[53]
Stochastic gradient descent with differentially private updates
Shuang Song, Kamalika Chaudhuri, and Anand D Sarwate. Stochastic gradient descent with differentially private updates. In2013 IEEE global conference on signal and information processing, pages 245–248. IEEE, 2013
work page 2013
-
[54]
Hossein Taheri and Christos Thrampoulidis. Generalization and stability of interpolating neural networks with minimal width.Journal of Machine Learning Research, 25(156):1–41, 2024
work page 2024
-
[55]
Sharper guarantees for learning neural network classifiers with gradient methods
Hossein Taheri, Christos Thrampoulidis, and Arya Mazumdar. Sharper guarantees for learning neural network classifiers with gradient methods. InInternational Conference on Learning Representations, 2025
work page 2025
-
[56]
Kolmogorov-arnold networks (kans) for time series analysis
Cristian J Vaca-Rubio, Luis Blanco, Roberto Pereira, and M` arius Caus. Kolmogorov-arnold networks (kans) for time series analysis. In2024 IEEE Globecom Workshops (GC Wkshps), pages 1–6. IEEE, 2024
work page 2024
-
[57]
Cambridge university press, 2019
Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019
work page 2019
-
[58]
Optimal utility bounds for differentially private gradient descent in three-layer neural networks
Puyu Wang, Yunwen Lei, Marius Kloft, and Yiming Ying. Optimal utility bounds for differentially private gradient descent in three-layer neural networks. In2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA), pages 1–8. IEEE, 2025
work page 2025
-
[59]
Puyu Wang, Yunwen Lei, Di Wang, Yiming Ying, and Ding-Xuan Zhou. Generalization guarantees of gradient descent for shallow neural networks.Neural Computation, 37(2):344–402, 2025
work page 2025
-
[60]
Puyu Wang, Junyu Zhou, Philipp Liznerski, and Marius Kloft. Optimization, generalization and differential privacy bounds for gradient descent on Kolmogorov-Arnold networks. InInternational Conference on Machine Learning, 2026
work page 2026
-
[61]
On the expressiveness and spectral bias of kans
Yixuan Wang, Jonathan W Siegel, Ziming Liu, and Thomas Y Hou. On the expressiveness and spectral bias of kans. InInternational Conference on Learning Representations, 2025
work page 2025
-
[62]
Yizheng Wang, Jia Sun, Jinshuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, and Yinghua Liu. Kolmogorov–arnold-informed neural network: A physics-informed deep learning framework for solving forward and inverse problems based on kolmogorov–arnold networks. Computer Methods in Applied Mechanics and Engineering, 433:117518, 2025
work page 2025
-
[63]
Subsampled r´ enyi differential privacy and analytical moments accountant
Yu-Xiang Wang, Borja Balle, and Shiva Prasad Kasiviswanathan. Subsampled r´ enyi differential privacy and analytical moments accountant. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2019
work page 2019
-
[64]
Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness
Ruichen Xu and Kexin Chen. Differential privacy in two-layer networks: How dp-sgd harms fairness and robustness.arXiv preprint arXiv:2603.04881, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[65]
Jiaming Zhang, Huanyi Xie, Meng Ding, Shaopeng Fu, Jinyan Liu, and Di Wang. Understanding the im- pact of differentially private training on memorization of long-tailed data.arXiv preprint arXiv:2602.03872, 2026. 14
-
[66]
Junyu Zhou, Puyu Wang, and Ding-Xuan Zhou. Generalization analysis with deep relu networks for metric and similarity learning.arXiv preprint arXiv:2405.06415, 2024
-
[67]
Optimal accounting of differential privacy via char- acteristic function
Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Optimal accounting of differential privacy via char- acteristic function. InInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
work page 2022
-
[68]
Gradient descent optimizes over-parameterized deep relu networks.Machine learning, 109:467–492, 2020
Difan Zou, Yuan Cao, Dongruo Zhou, and Quanquan Gu. Gradient descent optimizes over-parameterized deep relu networks.Machine learning, 109:467–492, 2020. 15 Appendix A Further Related Work This appendix expands on the related work referenced in Section 2, covering generalization theory for neural networks (Appendix A.1) and privacy amplification by subsam...
work page 2020
-
[69]
Rearranging the above inequality gives the claim
+P A GZ(z, VZ)∩E c 3 ≤δ pot. Rearranging the above inequality gives the claim. Recall that zδZ = p mdp+ r 2 log 2T δZ andV ∆,δ∆ = 2T G2 δ B + 8G2 δ log 1 δ∆ , VZ,δZ = (T−1)mdp+ 2 r (T−1)mdplog 2 δZ + 2 log 2 δZ , and Mδpot = 4 √ 2ηG δ ¯R r Tlog(6/δ pot) B + 4(1−λ)ηc priv ¯R s Tlog 6 δpot + 4(1−λ)η 2cprivGδ r VZ,δZ log(6/δpot) B .(13) We now combine the hi...
-
[70]
Since replacing δ by a constant fraction only affects logarithmic factors by absolute constants, we suppress this distinction below. Set τ 2 γ ≍ log2(T)+log(n/δ) γ2 . We choose the shifted localization radius as ¯R = C ¯Rτγ, where C ¯R > 0 is a sufficiently large universal constant. Our proof consists of the following steps. (i). Comparator construction u...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.