pith. machine review for the scientific record. sign in

arxiv: 2605.13503 · v1 · submitted 2026-05-13 · 💻 cs.CR · cs.LG

Recognition: no theorem link

Limits of Personalizing Differential Privacy Budgets

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:12 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords differential privacypersonalized privacy budgetsmean estimationthresholding operatorprivacy-utility trade-offlimitations of personalizationconstant-factor improvements
0
0 comments X

The pith

For mean estimation, a simple thresholding operator on privacy budgets captures nearly all the utility gains of full personalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that personalized differential privacy budgets have significant limitations in practice. For the task of mean estimation, the main driver of utility is selecting the right overall privacy level rather than customizing it for every user. This selection can be accomplished with a basic thresholding operator that effectively ignores the strictest privacy demands. Full personalization provides only limited, constant-factor improvements over this baseline in settings with mixed public-private data or two-tier privacy requirements. The authors also derive upper bounds on the possible gains for arbitrary privacy preference distributions.

Core claim

Personalized budgets come with major limitations, and for mean estimation the dominant factor is not full personalization but choosing the right effective privacy budget through a simple thresholding operator. Compared with this thresholding baseline, the gains from fully personalized mechanisms are limited to constant factors in mixed private and public datasets and in datasets with two levels of privacy requirements, with upper bounds established for arbitrary requirements.

What carries the argument

The thresholding operator that determines an effective uniform privacy budget by filtering the most demanding individual requirements.

If this is right

  • In mixed public and private datasets, full personalization improves utility by at most a constant factor over thresholding.
  • In datasets with two levels of privacy requirements, similar constant-factor bounds apply.
  • For arbitrary privacy requirements, upper bounds limit the maximal gain from personalization.
  • The utility is primarily determined by the choice of effective budget rather than per-user customization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners may achieve most benefits of differential privacy with simpler uniform-budget mechanisms.
  • This limitation might extend to other statistical queries beyond mean estimation.
  • Future work could explore whether similar thresholding suffices in non-additive noise settings or different data distributions.

Load-bearing premise

The analysis assumes standard additive-noise mechanisms for mean estimation and specific distributions of privacy requirements such as mixed public-private or two-level cases.

What would settle it

A counterexample where a fully personalized mechanism achieves super-constant-factor improvement in mean estimation utility over the thresholding baseline for standard additive noise would falsify the bounds.

Figures

Figures reproduced from arXiv: 2605.13503 by Edwige Cyffers, Juba Ziani.

Figure 1
Figure 1. Figure 1: Ratio between the unique-threshold estimator and the best affine operator for a combination [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Ratio between the unique-threshold estimator and the best affine operator for two finite [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

A key technical difficulty in differential privacy is selecting a privacy budget that satisfies privacy requirements while maximizing utility. A natural and well-studied workaround is to use personalized privacy budgets, which may differ across agents. In this paper, we show that personalized budgets come with major limitations and that for mean estimation, the dominant factor is not full personalization, but rather choosing the right effective privacy budget. This can be achieved through a simple thresholding operator that we describe. Compared with this thresholding baseline, the gains obtained by fully personalized mechanisms are limited. In particular, we precisely quantify the constant-factor improvement in settings with mixed private and public datasets and in private datasets with two levels of privacy requirements. We also establish upper bounds and identify regimes of maximal gain for arbitrary privacy requirements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that personalized differential privacy budgets have major limitations for mean estimation. It shows that a simple thresholding operator on the effective privacy budget matches or nearly matches the utility of fully personalized mechanisms under additive noise (Gaussian/Laplace), with only constant-factor gains from full personalization. This is quantified for mixed public-private datasets and two-level privacy requirements, with upper bounds and regimes of maximal gain identified for arbitrary requirements.

Significance. If the bounds hold, the result indicates that choosing an effective privacy budget via thresholding is the dominant factor for utility in mean estimation, rather than full personalization. This could simplify DP deployments in practice while providing concrete constant-factor comparisons that guide when personalization is worthwhile.

major comments (2)
  1. [§4 (Constant-factor improvements for mixed and two-level cases)] The central claim for mean estimation rests on additive-noise mechanisms and specific (two-level/mixed) privacy-requirement distributions. For arbitrary distributions with heavy tails or correlations between privacy requirements and data values, the noise-scale calculation and resulting utility gap may exceed the reported constant factors (see the reduction to effective epsilon via thresholding).
  2. [§5 (Upper bounds for arbitrary requirements)] The analysis does not address whether thresholding remains near-optimal once the mechanism is allowed to use data-dependent noise or public-data-assisted estimators; this is a load-bearing assumption for the generality of the upper bounds.
minor comments (1)
  1. [§3 (Thresholding baseline)] Clarify the exact definition and implementation of the thresholding operator in the main text, including how it interacts with the privacy requirement distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. Our paper focuses on additive-noise mechanisms for mean estimation and demonstrates that a simple thresholding operator on privacy budgets achieves performance within constant factors of fully personalized mechanisms. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [§4 (Constant-factor improvements for mixed and two-level cases)] The central claim for mean estimation rests on additive-noise mechanisms and specific (two-level/mixed) privacy-requirement distributions. For arbitrary distributions with heavy tails or correlations between privacy requirements and data values, the noise-scale calculation and resulting utility gap may exceed the reported constant factors (see the reduction to effective epsilon via thresholding).

    Authors: We agree that the constant-factor results in §4 are derived for the mixed public-private and two-level cases under additive noise. For fully arbitrary distributions, including heavy-tailed privacy requirements or correlations with data values, the gap could be larger than the constants we report. Our upper bounds in §5 provide a general characterization of the maximal gain from personalization, but we will add a clarifying paragraph in the revised manuscript noting that the explicit constant-factor comparisons are specific to the analyzed distributions while the thresholding reduction to an effective epsilon remains valid more broadly. revision: partial

  2. Referee: [§5 (Upper bounds for arbitrary requirements)] The analysis does not address whether thresholding remains near-optimal once the mechanism is allowed to use data-dependent noise or public-data-assisted estimators; this is a load-bearing assumption for the generality of the upper bounds.

    Authors: The upper bounds in §5 are established specifically for additive-noise mechanisms (Gaussian and Laplace), which is the standard setting for mean estimation under differential privacy. Data-dependent noise allocation or public-data-assisted estimators fall outside this model and would require a separate analysis; our contribution is to show that, within the additive-noise regime, thresholding on the effective privacy budget is near-optimal up to constants. We will explicitly state this scope limitation in the revised introduction and conclusion to avoid overgeneralization. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from standard DP definitions

full rationale

The paper derives its central claims on the limits of personalized DP budgets for mean estimation directly from standard additive-noise mechanisms (Gaussian/Laplace) and explicit privacy-requirement distributions (mixed public-private or two-level). The thresholding operator is introduced as an explicit baseline construction, with constant-factor bounds and upper bounds obtained via direct analysis of noise scales and utility gaps; no step reduces a prediction to a fitted parameter by construction, invokes a self-citation as the sole load-bearing justification, or renames a known result. The derivation remains independent of the target results and relies on externally verifiable DP primitives.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard differential privacy definitions and the mean estimation query without introducing new entities or many free parameters; analysis uses established noise-addition mechanisms.

axioms (2)
  • standard math Standard definition of epsilon-differential privacy
    Invoked as the foundation for all privacy guarantees and utility comparisons.
  • domain assumption Mean estimation as the central query with additive noise mechanisms
    The entire analysis and thresholding operator are developed specifically for this setting.

pith-pipeline@v0.9.0 · 5418 in / 1273 out tokens · 50667 ms · 2026-05-14T18:12:18.227737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Personalized differential privacy for ridge regression under output perturbation.Naval Research Logistics (NRL), 73(4):525–537, 2026

    Krishna Acharya, Franziska Boenisch, Rakshit Naidu, and Juba Ziani. Personalized differential privacy for ridge regression under output perturbation.Naval Research Logistics (NRL), 73(4):525–537, 2026

  2. [2]

    Heterogeneous Differential Privacy

    Mohamed Alaggan, Sébastien Gambs, and Anne-Marie Kermarrec. Heterogeneous differential privacy.arXiv preprint arXiv:1504.06998, 2015

  3. [3]

    Anita Allen.Unpopular Privacy: What Must We Hide?OUP Usa, New York, US, 2011

  4. [4]

    Limits of private learning with access to public data

    Noga Alon, Raef Bassily, and Shay Moran. Limits of private learning with access to public data. Advances in neural information processing systems, 32, 2019

  5. [5]

    Data sharing with endogenous choices over differential privacy levels.arXiv preprint arXiv:2602.09357, 2026

    Raef Bassily, Kate Donahue, Diptangshu Sen, Annuo Zhao, and Juba Ziani. Data sharing with endogenous choices over differential privacy levels.arXiv preprint arXiv:2602.09357, 2026

  6. [6]

    Private estimation with public data.Advances in neural information processing systems, 35:18653–18666, 2022

    Alex Bie, Gautam Kamath, and Vikrant Singhal. Private estimation with public data.Advances in neural information processing systems, 35:18653–18666, 2022

  7. [7]

    Oracle-efficient differentially private learning with public data.Advances in Neural Information Processing Systems, 37:113191–113233, 2024

    Adam Block, Mark Bun, Rathin Desai, Abhishek Shetty, and Zhiwei S Wu. Oracle-efficient differentially private learning with public data.Advances in Neural Information Processing Systems, 37:113191–113233, 2024

  8. [8]

    Have it your way: Individualized privacy assignment for dp-sgd.Advances in Neural Information Processing Systems, 36:19073–19103, 2023

    Franziska Boenisch, Christopher Mühl, Adam Dziedzic, Roy Rinberg, and Nicolas Papernot. Have it your way: Individualized privacy assignment for dp-sgd.Advances in Neural Information Processing Systems, 36:19073–19103, 2023

  9. [9]

    Individ- ualized pate: Differentially private machine learning with individual privacy guarantees.arXiv preprint arXiv:2202.10517, 2022

    Franziska Boenisch, Christopher Mühl, Roy Rinberg, Jannis Ihrig, and Adam Dziedzic. Individ- ualized pate: Differentially private machine learning with individual privacy guarantees.arXiv preprint arXiv:2202.10517, 2022

  10. [10]

    Average-case averages: Private algorithms for smooth sensitivity and mean estimation.Advances in Neural Information Processing Systems, 32, 2019

    Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation.Advances in Neural Information Processing Systems, 32, 2019

  11. [11]

    Courtade

    Syomantak Chaudhuri and Thomas A. Courtade. Mean estimation under heterogeneous privacy: Some privacy can be free.2023 IEEE International Symposium on Information Theory (ISIT), pages 1639–1644, 2023

  12. [12]

    Courtade

    Syomantak Chaudhuri and Thomas A. Courtade. Managing correlations in data and privacy de- mand. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, CCS ’25, page 2384–2398. ACM, November 2025

  13. [13]

    Courtade

    Syomantak Chaudhuri, Konstantin Miagkov, and Thomas A. Courtade. Mean estimation under heterogeneous privacy demands.IEEE Transactions on Information Theory, 71(2):1362–1375, February 2025

  14. [14]

    Individual sensitivity preprocessing for data privacy

    Rachel Cummings and David Durfee. Individual sensitivity preprocessing for data privacy. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 528–547. SIAM, 2020

  15. [15]

    Optimal data acquisition with privacy-aware agents

    Rachel Cummings, Hadi Elzayn, Emmanouil Pountourakis, Vasilis Gkatzelis, and Juba Ziani. Optimal data acquisition with privacy-aware agents. In2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 210–224. IEEE, 2023. 10

  16. [16]

    Accuracy for sale: Aggregating data with a variance constraint

    Rachel Cummings, Katrina Ligett, Aaron Roth, Zhiwei Steven Wu, and Juba Ziani. Accuracy for sale: Aggregating data with a variance constraint. InProceedings of the 2015 conference on innovations in theoretical computer science, pages 317–324, 2015

  17. [17]

    Setting epsilon is not the issue in differential privacy

    Edwige Cyffers. Setting epsilon is not the issue in differential privacy. InProceedings of the 39th International Conference on Neural Information Processing Systems, 2025

  18. [18]

    Springer Berlin Heidelberg, 2006

    Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith.Calibrating Noise to Sensitivity in Private Data Analysis, page 265–284. Springer Berlin Heidelberg, 2006

  19. [19]

    Optimal and differentially private data acquisition: Central and local mechanisms.Operations Research, 72(3):1105–1123, 2024

    Alireza Fallah, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. Optimal and differentially private data acquisition: Central and local mechanisms.Operations Research, 72(3):1105–1123, 2024

  20. [20]

    Individual privacy accounting via a renyi filter

    Vitaly Feldman and Tijana Zrnic. Individual privacy accounting via a renyi filter. InAdvances in Neural Information Processing Systems, volume 33, 2020

  21. [21]

    Explaining the privacy paradox: A systematic review of literature investigating privacy attitude and behavior.Computers and Security, 77:226–261, August 2018

    Nina Gerber, Paul Gerber, and Melanie V olkamer. Explaining the privacy paradox: A systematic review of literature investigating privacy attitude and behavior.Computers and Security, 77:226–261, August 2018

  22. [22]

    Privacy and coordination: Computing on databases with endogenous participation

    Arpita Ghosh and Katrina Ligett. Privacy and coordination: Computing on databases with endogenous participation. InProceedings of the fourteenth ACM conference on Electronic commerce, pages 543–560, 2013

  23. [23]

    Selling privacy at auction

    Arpita Ghosh and Aaron Roth. Selling privacy at auction. InACM Conference on Electronic Commerce, pages 199–208, 2011

  24. [24]

    Simple versus optimal mechanisms

    Jason D Hartline and Tim Roughgarden. Simple versus optimal mechanisms. InProceedings of the 10th ACM conference on Electronic commerce, pages 225–234, 2009

  25. [25]

    Conservative or liberal? personalized differential privacy

    Zachary Jorgensen, Ting Yu, and Graham Cormode. Conservative or liberal? personalized differential privacy. InIEEE International Conference on Data Engineering (ICDE), pages 1023–1034, 2015

  26. [26]

    Optimal differentially private model training with public data

    Andrew Lowy, Zeman Li, Tianjian Huang, and Meisam Razaviyayn. Optimal differentially private model training with public data. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024

  27. [27]

    Privacy as contextual integrity.Washington Law Review, 79, 05 2004

    Helen Nissenbaum. Privacy as contextual integrity.Washington Law Review, 79, 05 2004

  28. [28]

    Redrawing the boundaries on purchasing data from privacy-sensitive individuals

    Kobbi Nissim, Salil Vadhan, and David Xiao. Redrawing the boundaries on purchasing data from privacy-sensitive individuals. InProceedings of the 5th conference on Innovations in theoretical computer science, pages 411–422, 2014

  29. [29]

    Correlated noise mechanisms for differentially private learning, 2025

    Krishna Pillutla, Jalaj Upadhyay, Christopher A. Choquette-Choo, Krishnamurthy Dj Dvijotham, Arun Ganesh, Monika Henzinger, Jonathan Katz, Ryan McKenna, H. B. McMahan, Keith Rush, Thomas Steinke, and Abhradeep Thakurta. Correlated noise mechanisms for differentially private learning.ArXiv, abs/2506.08201, 2025

  30. [30]

    Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta

    Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ml: A practical guide to machine learning with differential privacy.Journal of Artificial Intelligence Research, 77:1113–1201, 2023

  31. [31]

    Public-data assisted private stochastic optimization: Power and limitations

    Enayat Ullah, Michael Menart, Raef Bassily, Cristóbal Guzmán, and Raman Arora. Public-data assisted private stochastic optimization: Power and limitations. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 20383–20427, 2024

  32. [32]

    Differentially private learning with small public data.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6219–6226, April 2020

    Jun Wang and Zhi-Hua Zhou. Differentially private learning with small public data.Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):6219–6226, April 2020. 11 A Preliminaries: Proof of Claim 1 Since the weights sum to1, we must have 1 = nX j=1 wj = nX j=1 ηmin{ε (j), τ}, hence η= 1/s τ . Therefore wj = min{ε(j), τ} sτ , and the resultin...