DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Pith reviewed 2026-05-18 19:04 UTC · model grok-4.3
The pith
Dynamic quantization scheduling reduces accuracy loss from noise amplification in differentially private training while delivering up to 2.21 times higher throughput.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Quantization variance grows disproportionately under the noise injection of DP-SGD and DP-Adam; this degradation is reduced by a dynamic schedule that probabilistically rotates the set of quantized layers every epoch and prioritizes quantization decisions via a differentially private loss sensitivity estimator that consumes negligible privacy budget.
What carries the argument
DPQuant dynamic quantization scheduler that combines probabilistic layer rotation across epochs with a differentially private loss sensitivity estimator for layer prioritization.
If this is right
- DPQuant outperforms static quantization baselines on accuracy-compute trade-offs for ResNet18, ResNet50, and DenseNet121.
- Theoretical throughput on low-precision hardware improves by up to 2.21 times.
- Validation accuracy remains within 2 percent of full-precision DP training.
- The same scheduling gains appear when the method is applied to DP-Adam.
Where Pith is reading between the lines
- The same rotation-plus-private-ranking pattern could be applied to other noise-injected optimizers beyond DP-SGD and DP-Adam.
- Hardware measurements on actual low-precision accelerators would be needed to confirm the claimed throughput numbers translate to wall-clock savings.
- Combining the scheduler with complementary techniques such as gradient compression could produce further efficiency gains under fixed privacy budgets.
Load-bearing premise
The differentially private loss sensitivity estimator can reliably identify which layers can be quantized with little quality impact while using only a negligible fraction of the overall privacy budget.
What would settle it
An ablation that disables the loss sensitivity estimator or increases its privacy allocation, after which accuracy-compute curves fall back to the levels of static quantization baselines.
Figures
read the original abstract
Differentially-Private SGD (DP-SGD) and its adaptive variant DP-Adam are powerful techniques to protect user privacy when using sensitive data to train neural networks. During training, converting model weights and activations into low-precision formats, i.e., quantization, can drastically reduce training times, energy consumption, and cost, and is thus a widely used technique. In this work, we demonstrate for the first time that quantization causes significantly higher accuracy degradation in DP training compared to regular SGD. We observe that this is caused by noise injection, which amplifies quantization variance, leading to disproportionately large accuracy degradation. To address this challenge, we present DPQuant, a dynamic quantization framework that adaptively selects a changing subset of layers to quantize at each epoch. Our method combines two key ideas that effectively reduce quantization variance: (i) probabilistic sampling that rotates which layers are quantized every epoch, and (ii) loss-aware layer prioritization, which uses a differentially private loss sensitivity estimator to identify layers that can be quantized with minimal impact on model quality. This estimator consumes a negligible fraction of the overall privacy budget, preserving DP guarantees. Empirical evaluations on ResNet18, ResNet50, and DenseNet121 across a range of datasets demonstrate that DPQuant consistently outperforms static quantization baselines, achieving near Pareto-optimal accuracy-compute trade-offs and up to $2.21\times$ theoretical throughput improvements on low-precision hardware, with less than 2% drop in validation accuracy. We further show that our framework extends to DP-Adam with similar gains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that quantization induces significantly higher accuracy degradation under DP-SGD than under standard SGD because injected noise amplifies quantization variance. It proposes DPQuant, which dynamically selects a rotating subset of layers for quantization each epoch via (i) probabilistic sampling and (ii) a loss-aware prioritization that employs a differentially private loss sensitivity estimator. The estimator is asserted to consume only a negligible fraction of the total privacy budget. Experiments on ResNet18, ResNet50 and DenseNet121 across multiple datasets report that DPQuant outperforms static quantization baselines, reaches near-Pareto-optimal accuracy-compute trade-offs, delivers up to 2.21× theoretical throughput gains on low-precision hardware, and incurs less than 2 % validation-accuracy drop; similar gains are shown for DP-Adam.
Significance. If the empirical claims and the negligible-budget property of the estimator hold, the work would be a useful practical contribution: it directly tackles the under-studied interaction between DP noise and quantization error and supplies a concrete scheduling mechanism that improves efficiency without materially harming privacy or accuracy. The reported throughput numbers and extension to DP-Adam strengthen the case for deployment on quantized hardware.
major comments (2)
- [Abstract / §4 (method)] Abstract and the description of the loss sensitivity estimator: the central claim that the estimator “consumes a negligible fraction of the overall privacy budget” and still produces reliable layer rankings is load-bearing for both the DP guarantee and the reported accuracy gains. No concrete budget split (e.g., ε_estimator / ε_total), no formula for sensitivity computation, and no stability analysis under the added DP noise are supplied; if the estimator’s own noise corrupts the ranking, the dynamic schedule may not outperform static baselines.
- [Experimental evaluation] Empirical section: the headline results (<2 % accuracy drop, 2.21× throughput, Pareto optimality) are presented without error bars, exact (ε,δ) values, or ablations that isolate the contribution of the DP estimator versus the probabilistic rotation alone. This leaves open whether the observed gains are statistically robust or dataset/architecture-specific.
minor comments (2)
- [Method] Clarify the precise definition and implementation of the probabilistic rotation schedule (e.g., sampling probability per layer, epoch-wise reselection rule) so that the method is reproducible from the text alone.
- [Figures and tables] Add error bars or confidence intervals to all accuracy and throughput plots; without them the “near Pareto-optimal” claim is difficult to evaluate.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the practical significance of addressing the interaction between DP noise and quantization. We address each major comment below and commit to revisions that strengthen the clarity and robustness of the claims without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract / §4 (method)] Abstract and the description of the loss sensitivity estimator: the central claim that the estimator “consumes a negligible fraction of the overall privacy budget” and still produces reliable layer rankings is load-bearing for both the DP guarantee and the reported accuracy gains. No concrete budget split (e.g., ε_estimator / ε_total), no formula for sensitivity computation, and no stability analysis under the added DP noise are supplied; if the estimator’s own noise corrupts the ranking, the dynamic schedule may not outperform static baselines.
Authors: We agree that the current presentation of the loss sensitivity estimator in §4 is high-level and that concrete details are needed to fully substantiate the negligible-budget claim and the reliability of the resulting layer rankings. The manuscript describes the estimator as a DP mechanism with bounded sensitivity, but does not provide an explicit numerical budget allocation, the precise sensitivity formula, or a stability analysis. In the revised manuscript we will add a dedicated paragraph in §4 that (i) states the exact budget split used (e.g., ε_estimator ≤ 0.05 ε_total), (ii) gives the closed-form sensitivity expression, and (iii) reports an empirical stability study comparing private versus non-private layer rankings across the evaluated architectures. These additions will directly address the concern that estimator noise could degrade the dynamic schedule. revision: yes
-
Referee: [Experimental evaluation] Empirical section: the headline results (<2 % accuracy drop, 2.21× throughput, Pareto optimality) are presented without error bars, exact (ε,δ) values, or ablations that isolate the contribution of the DP estimator versus the probabilistic rotation alone. This leaves open whether the observed gains are statistically robust or dataset/architecture-specific.
Authors: We concur that the experimental section would benefit from greater statistical transparency and component-wise ablations. The current results report mean accuracy and throughput but omit standard deviations, do not list the precise (ε, δ) tuples for every configuration, and do not isolate the loss-aware prioritization from the probabilistic rotation. In the revision we will (i) add error bars (mean ± std) to all accuracy and throughput figures, (ii) explicitly tabulate the (ε, δ) values used for each dataset–architecture pair, and (iii) include a new ablation table that compares full DPQuant against a variant that uses only probabilistic rotation (i.e., without the DP loss-sensitivity estimator). These changes will demonstrate that both mechanisms contribute to the reported gains and that the improvements are statistically consistent across the evaluated settings. revision: yes
Circularity Check
No circularity: DPQuant is an empirical algorithmic proposal with independent evaluations
full rationale
The paper proposes DPQuant as a practical dynamic quantization scheduler for DP-SGD that combines probabilistic layer rotation with a loss-aware prioritization step driven by a DP sensitivity estimator. All performance claims (accuracy, throughput, Pareto trade-offs) are backed by direct empirical measurements on ResNet18/50 and DenseNet121 across datasets, rather than any closed-form derivation or prediction that reduces to the method's own fitted quantities. The statement that the estimator consumes a negligible privacy budget fraction is presented as an implementation choice without any equation that re-derives or re-uses the same sensitivity scores to justify itself. No self-citation chain is load-bearing for the core contribution, and the work remains falsifiable by external replication on the same models and privacy parameters.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Quantization variance is amplified by the noise injection of DP-SGD, causing larger accuracy degradation than in non-private training.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
probabilistic sampling that rotates which layers are quantized every epoch, and (ii) loss-aware layer prioritization, which uses a differentially private loss sensitivity estimator
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Algorithm 1 COMPUTE LOSS IMPACT … Sampled Gaussian Mechanism … UPDATE PRIVACY
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang
Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16. ACM, October 2016. doi: 10.1145/2976749.2978318. URL http://dx.doi.org/10.1145/2976749.2978318
-
[2]
AmirAli Abdolrashidi, Lisa Wang, Shivani Agrawal, Jonathan Malmaud, Oleg Rybakov, Chas Leichner, and Lukasz Lew. Pareto-optimal quantized resnet is mostly 4-bit. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), page 3085–3093. IEEE, June 2021. doi: 10.1109/cvprw53098.2021.00345. URL http://dx.doi.org/10. 1109/CVPRW...
-
[3]
Advanced Micro Devices, Inc. AMD Instinct ™ MI300X Accelerator Data Sheet: Leading-Edge Accelerator Module for Generative AI, Training, and High-Performance Computing. Technical report, Advanced Micro Devices, Inc., 2023. URL https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/ data-sheets/amd-instinct-mi300x-data-sheet.pdf . Accessed: 2025-05-13
work page 2023
-
[4]
Data types and precision support
Advanced Micro Devices, Inc. Data types and precision support. https://rocm.docs.amd. com/en/latest/reference/precision-support.html, March 2025. ROCm Documen- tation; Accessed: 2025-05-13
work page 2025
-
[5]
L-greco: Layerwise-adaptive gradient compression for efficient and accurate deep learning, 2023
Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, and Dan Alistarh. L-greco: Layerwise-adaptive gradient compression for efficient and accurate deep learning, 2023. URL https://arxiv.org/abs/2210.17357
-
[6]
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan V ojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding, 2017. URL https: //arxiv.org/abs/1610.02132
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Post-training 4-bit quantization of convolution networks for rapid-deployment, 2019
Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. Post-training 4-bit quantization of convolution networks for rapid-deployment, 2019. URL https://arxiv.org/abs/1810. 05723
work page 2019
-
[8]
Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, and Ping Luo. Efficientqat: Efficient quantization-aware training for large language models. arXiv preprint arXiv:2407.11062, 2024
-
[9]
Accurate neural training with 4-bit matrix multiplications at standard formats, 2024
Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, and Daniel Soudry. Accurate neural training with 4-bit matrix multiplications at standard formats, 2024. URL https: //arxiv.org/abs/2112.10769
-
[10]
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping activation for quantized neural networks, 2018. URL https://arxiv.org/abs/1805.06085
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[11]
EMNIST: an extension of MNIST to handwritten letters
Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. Emnist: an extension of mnist to handwritten letters, 2017. URL https://arxiv.org/abs/1702.05373
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Soham De, Leonard Berrada, Jamie Hayes, Samuel L. Smith, and Borja Balle. Unlocking high-accuracy differentially private image classification through scale, 2022. URL https: //arxiv.org/abs/2204.13650
-
[13]
Cbq: Cross-block quantization for large language models
Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, et al. Cbq: Cross-block quantization for large language models. arXiv preprint arXiv:2312.07950, 2023
-
[14]
Hawq: Hessian aware quantization of neural networks with mixed-precision, 2019
Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision, 2019. URL https://arxiv. org/abs/1905.03696
-
[15]
Dynamic differential-privacy preserving sgd, 2022
Jian Du, Song Li, Xiangyi Chen, Siheng Chen, and Mingyi Hong. Dynamic differential-privacy preserving sgd, 2022. URL https://arxiv.org/abs/2111.00173. 11
-
[16]
The algorithmic foundations of differential privacy
Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foun- dations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014. doi: 10.1561/ 0400000042. URL https://www.nowpublishers.com/article/Details/TCS-042
work page 2014
-
[17]
Learned step size quantization.arXiv preprint arXiv:1902.08153, 2019
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. Learned step size quantization, 2020. URL https://arxiv.org/ abs/1902.08153
-
[18]
Mahoney and Kurt Keutzer , year=
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference, 2021. URL https: //arxiv.org/abs/2103.13630
-
[19]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[20]
Densely Connected Convolutional Networks
Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks, 2018. URL https://arxiv.org/abs/1608.06993
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Accurate post training quantization with small calibration sets
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. Accurate post training quantization with small calibration sets. In International Conference on Machine Learning, pages 4466–4475. PMLR, 2021
work page 2021
-
[22]
Low-rank compression of neural nets: Learning the rank of each layer
Yerlan Idelbayev and Miguel A Carreira-Perpinán. Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8049–8059, 2020
work page 2020
-
[23]
Quantization and training of neural networks for efficient integer-arithmetic-only inference, 2017
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference, 2017. URL https://arxiv.org/abs/1712. 05877
work page 2017
-
[24]
Sergey Ioffe and Christian Szegedy
Matthew Jagielski, Jonathan Ullman, and Alina Oprea. Auditing differentially private machine learning: How private is private sgd?, 2020. URL https://arxiv.org/abs/2006.07709
-
[25]
Accelerating stochastic gradient descent using predictive variance reduction
Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/ ac1dd209cb...
work page 2013
-
[26]
Error feedback fixes signsgd and other gradient compression schemes
Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian Stich, and Martin Jaggi. Error feedback fixes signsgd and other gradient compression schemes. In International Conference on Machine Learning, pages 3252–3261. PMLR, 2019
work page 2019
-
[27]
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Learning Multiple Layers of Features from Tiny Images
Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Technical re- port, University of Toronto, April 2009. URL https://www.cs.toronto.edu/~kriz/ learning-features-2009-TR.pdf
work page 2009
- [29]
-
[30]
Torchvision: Pytorch’s computer vision library
TorchVision maintainers and contributors. Torchvision: Pytorch’s computer vision library. https://github.com/pytorch/vision, 2016
work page 2016
-
[31]
An optimization framework for differentially private sparse fine-tuning, 2025
Mehdi Makni, Kayhan Behdin, Gabriel Afriat, Zheng Xu, Sergei Vassilvitskii, Natalia Pono- mareva, Hussein Hazimeh, and Rahul Mazumder. An optimization framework for differentially private sparse fine-tuning, 2025. URL https://arxiv.org/abs/2503.12822. 12
-
[32]
Paulius Micikevicius, Dusan Stosic, Neil Burgess, Marius Cornea, Pradeep Dubey, Richard Grisenthwaite, Sangwon Ha, Alexander Heinecke, Patrick Judd, John Kamalu, Naveen Mellem- pudi, Stuart Oberman, Mohammad Shoeybi, Michael Siu, and Hao Wu. Fp8 formats for deep learning, 2022. URL https://arxiv.org/abs/2209.05433
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[33]
arXiv preprint arXiv:1908.10530 (2019)
Ilya Mironov, Kunal Talwar, and Li Zhang. Rényi differential privacy of the sampled gaussian mechanism, 2019. URL https://arxiv.org/abs/1908.10530
-
[34]
R+r:understanding hyperparameter effects in dp-sgd, 2024
Felix Morsbach, Jan Reubold, and Thorsten Strufe. R+r:understanding hyperparameter effects in dp-sgd, 2024. URL https://arxiv.org/abs/2411.02051
-
[35]
Data-free quantization through weight equalization and bias correction
Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1325–1334, 2019
work page 2019
-
[36]
A White Paper on Neural Network Quantization
Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart Van Baalen, and Tijmen Blankevoort. A white paper on neural network quantization. arXiv preprint arXiv:2106.08295, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[37]
Nvidia blackwell architecture technical overview, 2024
NVIDIA Corporation. Nvidia blackwell architecture technical overview, 2024. URL https: //resources.nvidia.com/en-us-blackwell-architecture . Accessed: 2025-05-05
work page 2024
-
[38]
Value-aware quantization for training and inference of neural networks
Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. Value-aware quantization for training and inference of neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 580–595, 2018
work page 2018
-
[39]
Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta
Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ml: A practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, July 2023. ISSN 1076-9757. doi: 10.1613/jair.1.14649. U...
-
[40]
Ai hardware cores/accelerators, 2024
Qualcomm Technologies, Inc. Ai hardware cores/accelerators, 2024. URL https://docs.qualcomm.com/bundle/publicresource/topics/80-63195-1/ AI-hardware-cores-accelerators.html . Accessed: 2025-05-05
work page 2024
-
[41]
Optimal clipping and magnitude-aware differentiation for improved quantization- aware training
Charbel Sakr, Steve Dai, Rangha Venkatesan, Brian Zimmer, William Dally, and Brucek Khailany. Optimal clipping and magnitude-aware differentiation for improved quantization- aware training. In International Conference on Machine Learning, pages 19123–19138. PMLR, 2022
work page 2022
-
[42]
Towards scalable distributed training of deep learning on public cloud clusters, 2020
Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, and Xiaowen Chu. Towards scalable distributed training of deep learning on ...
-
[43]
The german traffic sign recognition benchmark: A multi-class classification competition
Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traffic sign recognition benchmark: A multi-class classification competition. InThe 2011 International Joint Conference on Neural Networks, pages 1453–1460, 2011. doi: 10.1109/IJCNN.2011.6033395
-
[44]
Stich, Jean-Baptiste Cordonnier, and Martin Jaggi
Sebastian U. Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsified sgd with memory,
-
[45]
URL https://arxiv.org/abs/1809.07599
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
Ultra-low precision 4-bit training of deep neural networks
Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, and Kailash Gopalakrishnan. Ultra-low precision 4-bit training of deep neural networks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Proces...
work page 2020
-
[47]
Powersgd: Practical low-rank gradient compression for distributed optimization
Thijs V ogels, Sai Praneeth Karimireddy, and Martin Jaggi. Powersgd: Practical low-rank gradient compression for distributed optimization. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[48]
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. Haq: Hardware-aware automated quantization with mixed precision, 2019. URL https://arxiv.org/abs/1811.08886
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[49]
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Terngrad: Ternary gradients to reduce communication in distributed deep learning, 2017. URL https: //arxiv.org/abs/1705.07878
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
Bitwidth-adaptive quantization-aware neural network training: a meta-learning approach
Jiseok Youn, Jaehun Song, Hyung-Sin Kim, and Saewoong Bahk. Bitwidth-adaptive quantization-aware neural network training: a meta-learning approach. In European Con- ference on Computer Vision, pages 208–224. Springer, 2022
work page 2022
-
[51]
Randomized quantization is all you need for differential privacy in federated learning, 2023
Yeojoon Youn, Zihao Hu, Juba Ziani, and Jacob Abernethy. Randomized quantization is all you need for differential privacy in federated learning, 2023. URL https://arxiv.org/abs/ 2306.11913
-
[52]
Opacus: User-friendly differential privacy library in pytorch.arXiv preprint arXiv:2109.12298, 2021
Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, and Ilya Mironov. Opacus: User-friendly differential privacy library in pytorch, 2022. URL https://arxiv.org/abs/2109.12298
-
[53]
On compressing deep models by low rank and sparse decomposition
Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. On compressing deep models by low rank and sparse decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7370–7379, 2017
work page 2017
-
[54]
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, 2018. URL https://arxiv.org/abs/1606.06160. 14 A Appendix / supplemental material A.1 Training Hyperparameters While the learning rate might seem too high for regular SGD training, previous res...
work page internal anchor Pith review Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.