pith. sign in

arxiv: 2310.16260 · v3 · submitted 2023-10-25 · 📊 stat.ME

Differentially Private Estimation and Inference in High-Dimensional Regression with FDR Control

Pith reviewed 2026-05-24 06:49 UTC · model grok-4.3

classification 📊 stat.ME
keywords differentially private estimationhigh-dimensional regressionFDR controlsparsity selectiondebiased inferencemultiple testinglinear modelsprivacy preservation
0
0 comments X

The pith

DP-BIC, debiased estimators, and FDR procedures enable practical differentially private estimation and inference in sparse high-dimensional linear regression without prior sparsity knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops three tools for high-dimensional linear regression under differential privacy: a DP-BIC that selects the sparsity level automatically, a DP debiased algorithm that supports inference on selected coefficients by using the model's sparsity, and a DP multiple testing procedure that controls the false discovery rate during feature selection. These methods remove the requirement that the sparsity parameter be known in advance, which previous DP approaches needed. A reader would care because they allow private analysis of large sparse datasets, such as in genomics or finance, while still producing valid estimates, intervals, and selected predictors. The claims rest on theoretical guarantees for the privacy-utility trade-off and are checked through simulations plus real-data examples.

Core claim

By introducing a differentially private Bayesian Information Criterion for sparsity selection, adapting debiased estimators to the private setting, and constructing a multiple-testing rule that preserves FDR control, the procedures achieve valid estimation, inference on individual parameters, and private feature selection in sparse high-dimensional linear models without requiring the sparsity level to be supplied beforehand.

What carries the argument

The DP-BIC for automatic sparsity selection together with the DP debiased algorithm that exploits sparsity for private inference and the DP multiple testing procedure that maintains FDR control.

If this is right

  • Sparsity selection becomes feasible in private high-dimensional regression without external knowledge of the number of non-zero coefficients.
  • Inference on individual regression parameters can be performed while satisfying differential privacy by leveraging model sparsity.
  • Significant predictors can be identified with FDR control even after privacy noise is added.
  • The same framework supports both point estimation and multiple-testing decisions in one private pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to generalized linear models if the debiased step can be generalized beyond ordinary least squares.
  • Calibration of the privacy noise scale could be tuned further to improve power in the multiple-testing step without losing FDR control.
  • Real-world deployment would require checking whether the privacy budget allocation across selection, inference, and testing steps can be optimized for specific data regimes.

Load-bearing premise

The underlying linear model is sparse and the privacy noise does not invalidate the debiased estimators or the FDR guarantees.

What would settle it

Run the DP procedures on a simulated sparse regression dataset with known ground-truth coefficients and count how often the reported confidence intervals cover the true values or the selected features exceed the nominal FDR level.

Figures

Figures reproduced from arXiv: 2310.16260 by Linjun Zhang, Sai Li, Xintao Xia, Zhanrui Cai.

Figure 1
Figure 1. Figure 1: 95% confidence intervals for one realization of DP correction under the AR co [PITH_FULL_IMAGE:figures/full_fig_p022_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical FDRs and powers of Algorithm 5 (DP) and the non-private algorithm [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical FDRs and powers of Algorithm 5 (DP) and the non-private algorithm [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Simulation results evaluating the performance of the DP-BIC procedure for [PITH_FULL_IMAGE:figures/full_fig_p063_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Simulation results evaluating the performance of the DP-BIC procedure under [PITH_FULL_IMAGE:figures/full_fig_p065_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The 95% confidence interval of OLS, nonprivate debiased Lasso (NP), and pro [PITH_FULL_IMAGE:figures/full_fig_p067_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Numbers of the discovers for Algorithm 5 (DP), the non-private version (NP), [PITH_FULL_IMAGE:figures/full_fig_p068_7.png] view at source ↗
read the original abstract

This paper proposes new methodologies for conducting practical differentially private (DP) estimation and inference in high-dimensional linear regression. We first introduce a DP Bayesian Information Criterion (DP-BIC) for selecting the unknown sparsity parameter in differentially private sparse linear regression (DP-SLR), eliminating the need for prior knowledge of model sparsity, which is a requisite in the existing literature. Next, we develop the DP debiased algorithm that enables privacy-preserving inference on a particular subset of regression parameters. Our proposed method enables privacy-preserving inference on the regression parameters by leveraging the inherent sparsity of high-dimensional linear regression models. Additionally, we address private feature selection by considering multiple testing in high-dimensional linear regression by introducing a DP multiple testing procedure that controls the false discovery rate (FDR). This allows for accurate and privacy-preserving identification of significant predictors in the regression model. Through extensive simulations and real data analyses, we demonstrate the effectiveness of our proposed methods in conducting inference for high-dimensional linear models while safeguarding privacy and controlling the FDR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes DP-BIC for selecting the sparsity parameter in differentially private sparse linear regression without prior knowledge, a DP debiased algorithm for privacy-preserving inference on regression parameters by leveraging sparsity, and a DP multiple testing procedure that controls FDR for identifying significant predictors. Effectiveness is shown via simulations and real-data analyses under stated sparsity and privacy regimes.

Significance. If the concentration bounds, asymptotic normality after noise addition, and FDR control hold as claimed, the work supplies practical, implementable tools for private high-dimensional inference with explicit algorithms and simulation support. This addresses a gap between theoretical DP methods and usable inference/FDR procedures in sparse regression settings.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'a particular subset of regression parameters' is vague; the manuscript should state explicitly which coordinates receive inference guarantees and under what conditions on the support size.
  2. [Simulations] The simulation section should report the exact privacy parameter values (ε, δ) and sparsity levels used in each table/figure so that the FDR control and coverage results can be directly reproduced.
  3. [DP debiased algorithm] Notation for the noise scale in the DP debiased step should be unified across the algorithm description and the concentration lemma that follows it.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work on DP-BIC for sparsity selection, the DP-debiased inference procedure, and the DP-FDR control method in high-dimensional linear regression. We appreciate the recommendation for minor revision and the recognition that the paper supplies practical tools with explicit algorithms and simulation support.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces DP-BIC for sparsity selection, a DP debiased estimator, and a DP multiple-testing procedure for FDR control. These are constructed from standard differential privacy mechanisms (noise addition with explicit concentration bounds) and established high-dimensional regression tools (debiased Lasso, BH procedure). No equation reduces a claimed prediction to a fitted input by construction, no self-citation chain is load-bearing for the central guarantees, and no ansatz or uniqueness result is imported from the authors' prior work. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on standard domain assumptions of sparsity in high-dimensional regression and compatibility of privacy noise with statistical inference procedures; no free parameters or invented entities are identifiable from the abstract.

axioms (2)
  • domain assumption High-dimensional linear regression model is sparse
    Required for sparsity selection and inference on a subset of parameters.
  • domain assumption Differential privacy noise addition preserves validity of debiased estimators and FDR control
    Core premise enabling the DP versions of the algorithms.

pith-pipeline@v0.9.0 · 5705 in / 1074 out tokens · 23150 ms · 2026-05-24T06:49:31.882531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Differentially private hypothesis testing in survival analysis

    math.ST 2026-05 unverdicted novelty 7.0

    Initiates finite-sample theory for differentially private hypothesis testing in survival analysis, with private tests for Cox models and cumulative hazards plus minimax bounds.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 1 Pith paper

  1. [1]

    (2023 a )

    Avella-Medina, M., Bradshaw, C., and Loh, P.L. (2023 a ). Differentially private inference via noisy optimization. Annals of Statistics

  2. [2]

    (2023 b )

    Avella-Medina, M., Liu, Z., and Loh, P.L. (2023 b ). Differentially private inference for high dimensional M -estimators. manuscript unpublished yet

  3. [3]

    and Schwartzman, A

    Azriel, D. and Schwartzman, A. (2015). The empirical distribution of a large number of correlated normal variables. Journal of the American Statistical Association, 110(511), 1217--1228

  4. [4]

    and Cand \`e s, E.J

    Barber, R.F. and Cand \`e s, E.J. (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5), 2055--2085

  5. [5]

    and Cand \`e s, E.J

    Barber, R.F. and Cand \`e s, E.J. (2019). A knockoff filter for high-dimensional selective inference. The Annals of Statistics, 47(5), 2504--2537

  6. [6]

    Cai, T.T., Wang, Y., and Zhang, L. (2021). The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. The Annals of Statistics, 49(5), 2825--2850

  7. [7]

    Candes, E., Fan, Y., Janson, L., and Lv, J. (2018). Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(3), 551--577

  8. [8]

    Dai, C., Lin, B., Xing, X., and Liu, J.S. (2022). False discovery rate control via data splitting. Journal of the American Statistical Association, pages 1--18

  9. [9]

    Dai, C., Lin, B., Xing, X., and Liu, J.S. (2023). A scale-free approach for false discovery rate control in generalized linear models. Journal of the American Statistical Association, pages 1--15

  10. [10]

    Dwork, C., McSherry, F., Nissim, K., and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In TCC 2006, pages 265--284. Springer

  11. [11]

    and Roth, A

    Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science , 9(3--4), 211--407

  12. [12]

    Dwork, C., Rothblum, G.N., and Vadhan, S. (2010). Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51--60. IEEE

  13. [13]

    Dwork, C., Su, W., and Zhang, L. (2021). Differentially private false discovery rate control. Journal of Privacy and Confidentiality, 11(2)

  14. [14]

    Elbaz, A., Bower, J.H., Maraganore, D.M., McDonnell, S.K., Peterson, B.J., Ahlskog, J.E., Schaid, D.J., and Rocca, W.A. (2002). Risk tables for parkinsonism and parkinson's disease. Journal of Clinical Epidemiology, 55(1), 25--31

  15. [15]

    and Lv, J

    Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5), 849--911

  16. [16]

    and Montanari, A

    Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research, 15(1), 2869--2909

  17. [17]

    Lane, J., Stodden, V., Bender, S., and Nissenbaum, H. (2014). Privacy, big data, and the public good: Frameworks for engagement. Cambridge University Press

  18. [18]

    Little, M., McSharry, P., Hunter, E., Spielman, J., and Ramig, L. (2008). Suitability of dysphonia measurements for telemonitoring of parkinson’s disease. Nature Precedings, pages 1--1

  19. [19]

    Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control . The Annals of Statistics, 41(6), 2948 -- 2978

  20. [20]

    and Liu, H

    Ning, Y. and Liu, H. (2017). A general theory of hypothesis tests and confidence regions for sparse high dimensional models . The Annals of Statistics, 45(1), 158 -- 195. doi:10.1214/16-AOS1448

  21. [21]

    and Xiang, Y

    Pournaderi, M. and Xiang, Y. (2021). Differentially private variable selection via the knockoff filter. In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), pages 1--6. IEEE

  22. [22]

    Romano, Y., Sesia, M., and Cand \`e s, E. (2020). Deep knockoffs. Journal of the American Statistical Association, 115(532), 1861--1872

  23. [23]

    Shi, C., Song, R., Lu, W., and Li, R. (2021). Statistical inference for high-dimensional models via recursive online-score estimation. Journal of the American Statistical Association, 116(535), 1307--1318

  24. [24]

    Tan, K., Shi, L., and Yu, Z. (2020). Sparse sir: Optimal rates and adaptive estimation. The Annals of Statistics, 48(1), 64--85

  25. [25]

    Tsanas, A., Little, M., McSharry, P., and Ramig, L. (2009). Accurate telemonitoring of parkinson’s disease progression by non-invasive speech tests. Nature Precedings, pages 1--1

  26. [26]

    Van de Geer, S., B \"u hlmann, P., Ritov, Y., and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3), 1166--1202

  27. [27]

    Wang, P., Xie, M.G., and Zhang, L. (2022). Finite-and large-sample inference for model and coefficients in high-dimensional linear regression with repro samples. arXiv preprint arXiv:2209.09299

  28. [28]

    and Roeder, K

    Wasserman, L. and Roeder, K. (2009). High dimensional variable selection. Annals of statistics, 37(5A), 2178

  29. [29]

    and Cai, Z

    Xia, X. and Cai, Z. (2023). Adaptive false discovery rate control with privacy guarantee. Journal of Machine Learning Research, 24(252), 1--35

  30. [30]

    and Zhang, S.S

    Zhang, C.H. and Zhang, S.S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 217--242

  31. [31]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter doi edition editor eid howpublished institution isbn issn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mi...

  32. [32]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...