pith. machine review for the scientific record. sign in

arxiv: 2605.09890 · v1 · submitted 2026-05-11 · 💻 cs.CR · cs.LG

Recognition: 3 theorem links

· Lean Theorem

Deep Learning under Fractional-Order Differential Privacy

Mohammad Partohaghighi, Roummel Marcia

Pith reviewed 2026-05-12 04:38 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords differential privacystochastic gradient descentfractional orderdeep learningRényi differential privacysensitivity analysisprivacy-utility tradeofflong memory
0
0 comments X

The pith

FO-DP-SGD replaces the current gradient query with a fractional recursive one that incorporates memory from past private releases without raising the effective sensitivity beyond βC.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Fractional-Order Differentially Private Stochastic Gradient Descent as a mechanism-level change to DP-SGD. Instead of releasing a noisy version of only the current clipped subsampled gradient sum, the method first forms a fractional recursive query that blends the current sum with a finite-window power-law-weighted aggregation of prior private sum outputs. Under add/remove adjacency and Poisson subsampling, the sensitivity analysis finds that the only new data-dependent term remains the scaled current clipped sum, so the effective ℓ2-sensitivity stays at most βC. This lets the mechanism use ordinary per-step Rényi DP accounting for the Poisson-subsampled Gaussian with noise-to-sensitivity ratio σ/β, and the bounds compose to overall (ε,δ)-DP. Experiments on SVHN, CIFAR-10 and CIFAR-100 report higher test accuracy than DP-SGD and several other private optimizers while the fractional order, window size and mixing coefficient govern the sensitivity-signal-history trade-off. A sympathetic reader cares because the construction supplies a concrete way to study long-memory effects inside private optimization without paying an extra privacy price for the memory.

Core claim

FO-DP-SGD replaces the current-only query before noise addition with a fractional recursive query that combines the current clipped sum with a finite-window, power-law-weighted aggregation of previously released private sum-level outputs. Under add/remove adjacency with Poisson subsampling the current-step sensitivity analysis shows that the only newly data-dependent term is the scaled current clipped sum, so the effective ℓ2-sensitivity is at most βC where β∈(0,1] controls the current-step contribution. Thus FO-DP-SGD admits standard per-step Rényi differential privacy accounting via a Poisson-subsampled Gaussian mechanism with effective noise-to-sensitivity ratio σ/β, and the bounds compos

What carries the argument

Fractional recursive query that mixes the current clipped sum with power-law-weighted past private releases before Gaussian noise is added.

If this is right

  • The privacy accounting stays identical to standard DP-SGD except for the effective sensitivity βC, allowing the usual noise calibration to be tightened by the factor β.
  • The fractional order, memory window length and mixing coefficient directly control the trade-off among current-step sensitivity, retention of past signal and influence of private history.
  • The mechanism preserves the classical sum-then-noise-then-divide structure while adding memory, so existing DP-SGD code changes are minimal.
  • Empirical results show improved test accuracy and privacy-utility curves on SVHN, CIFAR-10 and CIFAR-100 relative to DP-SGD, DP-Adam and listed baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fractional-query idea could be applied to other first-order private optimizers such as DP-Adam without changing their per-step accounting.
  • Choosing the power-law weights to decay faster might reduce the carry-over of early noisy steps, potentially improving convergence speed under fixed privacy budget.
  • The construction supplies a natural testbed for studying whether long-memory private updates help on sequential or recurrent tasks beyond image classification.

Load-bearing premise

The fractional recursive query introduces no additional data-dependent terms beyond the scaled current clipped sum.

What would settle it

An explicit pair of add/remove-adjacent datasets and a concrete fractional order where the difference in the full recursive query output exceeds β times the difference in their current clipped sums would falsify the sensitivity claim.

Figures

Figures reproduced from arXiv: 2605.09890 by Mohammad Partohaghighi, Roummel Marcia.

Figure 1
Figure 1. Figure 1: Cross-dataset comparison of test accuracy over training epochs on SVHN, CIFAR-10, and [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cross-dataset privacy–utility tradeoff over the fractional mixing parameter [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Joint ablation of the fractional mixing parameter [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study of the fractional memory depth [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Privacy–utility tradeoff for mechanism-level memory insertion. Test accuracy is plotted as a function of the privacy budget ε on CIFAR-10. DP-SGD serves as the no-memory private baseline. Post-FM-DP-SGD applies fractional memory after the standard DP-SGD private release, whereas FO-DP-SGD inserts the memory term directly into the sum-level private query before Gaussian perturbation. FO-DP-SGD achieves high… view at source ↗
Figure 6
Figure 6. Figure 6: Mechanism-level memory versus post-processing memory. Test accuracy is reported over training epochs on CIFAR-10. DP-SGD is the standard no-memory private baseline. Post-FM￾DP-SGD applies fractional memory after the noisy DP-SGD release, while FO-DP-SGD incorporates memory before Gaussian perturbation at the sum-query level. FO-DP-SGD achieves consistently higher test accuracy and faster improvement than b… view at source ↗
Figure 7
Figure 7. Figure 7: Test accuracy under fixed noise σ = 1.5 and varying clipping norms on CIFAR-10. Test accuracy is reported over training epochs for C ∈ {0.5, 1.0, 1.5, 2.0} with fixed noise multiplier σ = 1.5. Each subplot compares DP-SGD, adaptive DP optimizers, and FO-DP-SGD under the same fixed noise multiplier. FO-DP-SGD consistently achieves the strongest accuracy trajectory across all clipping values, indicating that… view at source ↗
Figure 8
Figure 8. Figure 8: Ablation on query-level memory design. Final and best test accuracy are reported as mean ± standard deviation over multiple runs. The current-only variant is equivalent to standard DP-SGD and removes recursive memory. Uniform memory uses equal weights over previous private releases, exponential memory uses a geometric decay kernel, and FO-DP-SGD uses the proposed confidence-aware tempered fractional query-… view at source ↗
read the original abstract

Differentially private stochastic gradient descent (DP-SGD) is a standard approach to privacy-preserving learning based on per-example clipping, subsampling, Gaussian perturbation, and privacy accounting. Classical DP-SGD releases a noisy version of the current clipped subsampled gradient sum. We propose Fractional-Order Differentially Private Stochastic Gradient Descent (\textbf{FO-DP-SGD}), a mechanism-level extension that replaces this current-only query, before Gaussian noise is added, with a fractional recursive query combining the current clipped sum with a finite-window, power-law-weighted aggregation of previously released private sum-level outputs. This injects fractional memory into the release mechanism while preserving the standard \emph{sum-then-noise-then-divide} structure. Under add/remove adjacency with Poisson subsampling, the current-step sensitivity analysis shows that the only newly data-dependent term is the scaled current clipped sum. Hence, conditioned on the private history, the effective \(\ell_2\)-sensitivity is at most \(\beta C\), where \(C\) is the clipping threshold and \(\beta\in(0,1]\) controls the current-step contribution. Thus, FO-DP-SGD admits standard per-step R\'enyi differential privacy accounting via a Poisson-subsampled Gaussian mechanism with effective noise-to-sensitivity ratio \(\sigma/\beta\), and composes to yield overall \((\varepsilon,\delta)\)-differential privacy guarantees. FO-DP-SGD provides a framework for studying long-memory effects in private optimization. The fractional order, memory window, and mixing coefficient govern the trade-off among current-step sensitivity, signal retention, and private-history influence. Experiments on SVHN, CIFAR-10, and CIFAR-100 show improved test accuracy and privacy--utility performance over DP-SGD and private baselines including DP-Adam, DP-IS, SA-DP-SGD, ADP-AdamW, DP-SAT, and DP-Adam-AC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Fractional-Order Differentially Private Stochastic Gradient Descent (FO-DP-SGD), a modification to DP-SGD that replaces the standard current-only clipped gradient query with a fractional recursive query: a linear combination of the current clipped sum (scaled by mixing coefficient β) and a finite-window power-law weighted aggregation of prior private outputs. Under add/remove adjacency and Poisson subsampling, it claims the effective ℓ₂-sensitivity remains at most βC (C the clipping threshold), permitting standard per-step Rényi DP accounting for the Poisson-subsampled Gaussian mechanism with noise-to-sensitivity ratio σ/β. The paper presents this as a framework for long-memory effects in private optimization and reports improved test accuracy over DP-SGD and several private baselines on SVHN, CIFAR-10, and CIFAR-100.

Significance. If the sensitivity reduction holds, the work supplies a clean mechanism-level route to injecting controllable memory into private gradient releases without altering the core sum-then-noise structure or requiring new accounting primitives beyond a rescaled Gaussian. This could facilitate systematic study of long-memory phenomena in DP training. The multi-dataset experiments provide initial evidence of utility gains, though their strength depends on reproducibility details.

major comments (2)
  1. [§3] §3 (Mechanism and Sensitivity Analysis): The central claim that, conditioned on the private history, the fractional query introduces no additional data-dependent terms beyond β times the current clipped sum (yielding effective sensitivity βC) is asserted to follow directly from the definition but is not accompanied by an explicit derivation or expansion of the query difference under add/remove neighbors. A step-by-step calculation showing that all prior terms become fixed constants under conditioning is required to confirm the bound is load-bearing for the subsequent RDP reduction.
  2. [§5] §5 (Experiments): The reported accuracy improvements lack sufficient detail on hyper-parameter search procedures, random seeds, number of runs, and exact baseline implementations (e.g., how DP-Adam, SA-DP-SGD, and ADP-AdamW were tuned and whether they used identical clipping and noise schedules). Without these, it is difficult to assess whether the gains are robust or attributable to the fractional memory rather than optimization differences.
minor comments (2)
  1. [Abstract / §1] The abstract and introduction use the phrase 'standard per-step Rényi differential privacy accounting' without citing the precise RDP composition theorem or the Poisson-subsampling RDP bound being invoked (e.g., the specific form from Mironov et al. or Abadi et al.). Adding the reference would improve traceability.
  2. [§2 / §3] Notation for the fractional order, memory window length, and power-law weights is introduced but not consistently symbolized across the mechanism definition and the sensitivity argument; a single table or equation block collecting all free parameters would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and reproducibility.

read point-by-point responses
  1. Referee: [§3] §3 (Mechanism and Sensitivity Analysis): The central claim that, conditioned on the private history, the fractional query introduces no additional data-dependent terms beyond β times the current clipped sum (yielding effective sensitivity βC) is asserted to follow directly from the definition but is not accompanied by an explicit derivation or expansion of the query difference under add/remove neighbors. A step-by-step calculation showing that all prior terms become fixed constants under conditioning is required to confirm the bound is load-bearing for the subsequent RDP reduction.

    Authors: We agree that an explicit derivation would improve clarity. In the revised manuscript we will insert a step-by-step expansion of the query difference Δq under add/remove adjacency. We will show that, when conditioned on the shared private history, every term involving prior private outputs is identical for neighboring datasets and therefore reduces to a fixed constant; the only data-dependent contribution is the current clipped sum, which differs by at most C and is scaled by β. This confirms the ℓ₂-sensitivity bound of βC and directly justifies the subsequent Poisson-subsampled Gaussian RDP accounting. revision: yes

  2. Referee: [§5] §5 (Experiments): The reported accuracy improvements lack sufficient detail on hyper-parameter search procedures, random seeds, number of runs, and exact baseline implementations (e.g., how DP-Adam, SA-DP-SGD, and ADP-AdamW were tuned and whether they used identical clipping and noise schedules). Without these, it is difficult to assess whether the gains are robust or attributable to the fractional memory rather than optimization differences.

    Authors: We acknowledge the need for greater experimental transparency. The revised §5 will describe the hyper-parameter search procedure (grid search ranges for learning rate, β, memory window, and fractional order), the number of independent runs and random seeds employed, and the precise baseline configurations, including how clipping thresholds and noise multipliers were matched across methods to ensure fair comparison. These additions will allow readers to evaluate whether the observed gains are attributable to the fractional-memory mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines FO-DP-SGD explicitly as a linear combination of the current clipped sum (scaled by β) and fixed weights on prior private outputs. The sensitivity claim then follows by direct inspection: under add/remove adjacency and conditioning on history, all prior terms are constants, so the query difference reduces exactly to β times the current clipped sum difference (bounded by C). This is standard conditional sensitivity analysis for the Poisson-subsampled Gaussian mechanism and does not rely on fitted parameters, self-citations, or any reduction of the output to the input by construction. The overall RDP accounting is therefore the usual one applied to the effective ratio σ/β, making the derivation self-contained.

Axiom & Free-Parameter Ledger

3 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard differential privacy assumptions plus three new tunable parameters introduced by the mechanism itself.

free parameters (3)
  • β
    Mixing coefficient in (0,1] that scales the current clipped sum's contribution to sensitivity; chosen by the user.
  • fractional order
    Parameter controlling the power-law decay of historical private sums; introduced by the paper.
  • memory window
    Finite length of the history used in the recursive aggregation; chosen by the user.
axioms (2)
  • domain assumption Add/remove adjacency relation under Poisson subsampling
    Standard assumption for per-step DP analysis in DP-SGD; invoked for the sensitivity bound.
  • domain assumption The fractional recursive query preserves the sum-then-noise-then-divide structure with no additional data-dependent terms beyond the scaled current sum
    Required for the claim that effective sensitivity remains βC conditioned on private history.

pith-pipeline@v0.9.0 · 5646 in / 1695 out tokens · 57932 ms · 2026-05-12T04:38:55.143823+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    URL http: //dx.doi.org/10.1145/2976749.2978318

    Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 308–318. ACM, 2016. doi: 10.1145/2976749.2978318

  2. [2]

    Mohan Suriyakumar, Om Thakkar, and Abhradeep Thakurta

    Eyal Amid, Ananda Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, V . Mohan Suriyakumar, Om Thakkar, and Abhradeep Thakurta. Public data-assisted mirror descent for private model training. InProceedings of the 39th International Conference on Machine Learning (ICML), pages 517–535, 2022. 9

  3. [3]

    Brendan McMahan, and Swaroop Ramaswamy

    Galen Andrew, Om Thakkar, H. Brendan McMahan, and Swaroop Ramaswamy. Differentially private learning with adaptive clipping. InAdvances in Neural Information Processing Systems 34 (NeurIPS), pages 17455–17466, 2021

  4. [4]

    Duchi, Alireza Fallah, Omid Javidbakht, and Kunal Talwar

    Hilal Asi, John C. Duchi, Alireza Fallah, Omid Javidbakht, and Kunal Talwar. Private adaptive gradient methods for convex optimization. InProceedings of the 38th International Conference on Machine Learning (ICML), pages 383–392, 2021

  5. [5]

    Improving the gaussian mechanism for differential pri- vacy: Analytical calibration and optimal denoising

    Borja Balle and Yu-Xiang Wang. Improving the gaussian mechanism for differential pri- vacy: Analytical calibration and optimal denoising. InProceedings of the 35th International Conference on Machine Learning (ICML), pages 394–403, 2018

  6. [6]

    Fractional-order deep backpropagation neural network

    Chao Bao, Yifei Pu, and Yuan Zhang. Fractional-order deep backpropagation neural network. Computational Intelligence and Neuroscience, 2018:7361628, 2018

  7. [7]

    Private empirical risk minimization: Efficient algorithms and tight error bounds

    Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization: Efficient algorithms and tight error bounds. InIEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), pages 464–473, 2014

  8. [8]

    Private stochastic convex optimization with optimal rates

    Raef Bassily, Vitaly Feldman, Kunal Talwar, and Abhradeep Thakurta. Private stochastic convex optimization with optimal rates. InAdvances in Neural Information Processing Systems 32 (NeurIPS), 2019

  9. [9]

    Concentrated differential privacy: Simplifications, extensions, and lower bounds

    Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. InTheory of Cryptography Conference (TCC), pages 635–658, 2016

  10. [10]

    Linear models of dissipation whose q is almost frequency independent—ii

    Michele Caputo. Linear models of dissipation whose q is almost frequency independent—ii. Geophysical Journal International, 13(5):529–539, 1967

  11. [11]

    Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization.Journal of Machine Learning Research, 12:1069–1109, 2011

  12. [12]

    S. V . Chilukoti, I. Hossen Md, L. Shan, V . S. Tida, and X. Hei. Differentially private fine-tuned NF-Net to predict GI cancer type.arXiv preprint arXiv:2502.11329, 2025

  13. [13]

    Jinshuo Dong, Aaron Roth, and Weijie J. Su. Gaussian differential privacy.Journal of the Royal Statistical Society: Series B, 84(1):3–37, 2022

  14. [14]

    Now Publishers, 2014

    Cynthia Dwork and Aaron Roth.The Algorithmic Foundations of Differential Privacy. Now Publishers, 2014

  15. [15]

    Calibrating noise to sensitivity in private data analysis

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. InTheory of Cryptography Conference (TCC), pages 265–284, 2006

  16. [16]

    Individual privacy accounting via a rényi filter

    Vitaly Feldman and Tijana Zrnic. Individual privacy accounting via a rényi filter. InAdvances in Neural Information Processing Systems 34 (NeurIPS), pages 15798–15810, 2021

  17. [17]

    Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu

    Peter Kairouz, H. Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. Practical and private (deep) learning without sampling or shuffling. InProceedings of the 38th International Conference on Machine Learning (ICML), pages 5213–5225, 2021

  18. [18]

    Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith

    Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately?SIAM Journal on Computing, 40(3):793–826, 2011

  19. [19]

    Kilbas, Hari M

    Anatoly A. Kilbas, Hari M. Srivastava, and Juan J. Trujillo.Theory and Applications of Fractional Differential Equations. Elsevier, 2006

  20. [20]

    Reddi, H

    Tian Li, Manzil Zaheer, Kuan Zhoulu Liu, Sashank J. Reddi, H. Brendan McMahan, and Virginia Smith. Differentially private adaptive optimization with delayed preconditioners. In International Conference on Learning Representations (ICLR), 2023

  21. [21]

    Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang

    H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. Learning differentially private recurrent language models. InInternational Conference on Learning Representations (ICLR), 2018. 10

  22. [22]

    Miller and Bertram Ross.An Introduction to the Fractional Calculus and Fractional Differential Equations

    Kenneth S. Miller and Bertram Ross.An Introduction to the Fractional Calculus and Fractional Differential Equations. Wiley, 1993

  23. [23]

    Rényi differential privacy

    Ilya Mironov. Rényi differential privacy. In2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017. doi: 10.1109/CSF.2017.11

  24. [24]

    Oldham and Jerome Spanier.The Fractional Calculus: Theory and Applications of Differentiation and Integration to Arbitrary Order

    Keith B. Oldham and Jerome Spanier.The Fractional Calculus: Theory and Applications of Differentiation and Integration to Arbitrary Order. Academic Press, 1974

  25. [25]

    Semi- supervised knowledge transfer for deep learning from private training data

    Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi- supervised knowledge transfer for deep learning from private training data. InInternational Conference on Learning Representations (ICLR), 2017

  26. [26]

    Scalable private learning with PATE

    Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. Scalable private learning with PATE. InInternational Conference on Learning Representations (ICLR), 2018

  27. [27]

    Differentially private sharpness-aware training

    Jongjin Park et al. Differentially private sharpness-aware training. InInternational Conference on Machine Learning (ICML), 2023

  28. [28]

    Preserving differential privacy in convolutional deep belief networks.Machine Learning, 106:1681–1704, 2017

    NhatHai Phan, Xiaoqian Wu, and Dejing Dou. Preserving differential privacy in convolutional deep belief networks.Machine Learning, 106:1681–1704, 2017

  29. [29]

    Academic Press, 1998

    Igor Podlubny.Fractional Differential Equations. Academic Press, 1998

  30. [30]

    Samko, Anatoly A

    Stefan G. Samko, Anatoly A. Kilbas, and Oleg I. Marichev.Fractional Integrals and Derivatives: Theory and Applications. Gordon and Breach, 1993

  31. [31]

    Convolutional neural networks with fractional order gradient method.Neurocomputing, 408:19–29, 2020

    Dongming Sheng, Yongguang Wei, YangQuan Chen, and Yuesheng Wang. Convolutional neural networks with fractional order gradient method.Neurocomputing, 408:19–29, 2020

  32. [32]

    Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 245–248, 2013

  33. [33]

    DP-AdamBC: Your DP-Adam is actually DP-SGD (unless you apply bias correction)

    Qingyuan Tang, Fedor Shpilevskiy, and Mathias Lécuyer. DP-AdamBC: Your DP-Adam is actually DP-SGD (unless you apply bias correction). InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, 2024

  34. [34]

    Subsampled rényi differential privacy and analytical moments accountant

    Yu-Xiang Wang, Borja Balle, and Shiva Prasad Kasiviswanathan. Subsampled rényi differential privacy and analytical moments accountant. InProceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), volume 89 ofProceedings of Machine Learning Research, pages 1226–1235, 2019

  35. [35]

    Dpis: An enhanced mechanism for differentially private sgd with importance sampling

    Jianxin Wei, Ergute Bao, Xiaokui Xiao, and Yin Yang. Dpis: An enhanced mechanism for differentially private sgd with importance sampling. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2885–2899, 2022. doi: 10.1145/3548606.3560562

  36. [36]

    Design of generalized fractional order gradient descent method.arXiv preprint arXiv:1901.05294, 2018

    Yongguang Wei, Yuqing Kang, Weijie Yin, and Yuesheng Wang. Design of generalized fractional order gradient descent method.arXiv preprint arXiv:1901.05294, 2018

  37. [37]

    R. Yang. DP-Adam-AC: Privacy-preserving fine-tuning of localizable language models using adam optimization with adaptive clipping.arXiv preprint arXiv:2510.05288, 2025. 11 Appendix Overview This appendix provides additional experimental evidence and theoretical analysis supporting the main paper. To improve readability and navigation, the supplementary ma...

  38. [38]

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board approvals, or an equivalent approval/review based on the requirements of your country or instit...