pith. sign in

arxiv: 2605.16906 · v1 · pith:CJTGSYI5new · submitted 2026-05-16 · 🧮 math.ST · stat.ME· stat.TH

Differentially private hypothesis testing in survival analysis

Pith reviewed 2026-05-19 19:09 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH
keywords differential privacyhypothesis testingsurvival analysisCox regressionright-censored datacumulative hazardminimax lower bounds
0
0 comments X p. Extension
pith:CJTGSYI5 Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{CJTGSYI5}

Prints a linked pith:CJTGSYI5 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Differentially private tests for Cox coefficients and cumulative hazards achieve finite-sample guarantees in survival analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first finite-sample theory for adding differential privacy to hypothesis testing on right-censored survival data. It constructs private versions of the partial-likelihood-ratio and score tests for Cox regression coefficients, together with a calibration step that sets the rejection threshold while keeping the privacy guarantee. It also supplies a private distributed two-sample test for comparing cumulative hazard functions. The constructions come with proofs of privacy, valid type-I error control at finite samples, and matching minimax lower bounds that show when privacy is essentially free and when it becomes the dominant cost. Readers care because survival data often records sensitive medical or lifetime events, so these results indicate precisely where privacy protections can be added without destroying the ability to detect effects.

Core claim

We initiate a finite-sample theory of private hypothesis testing in survival analysis applications. For Cox regression coefficients, we develop private partial-likelihood-ratio and score-type tests, including a private calibration procedure for the rejection threshold. For cumulative hazard functions, we propose a private distributed two-sample test. Across these problems, we prove differential privacy and finite-sample testing guarantees, as well as minimax lower bounds.

What carries the argument

Private partial-likelihood-ratio and score-type tests equipped with a private calibration procedure for rejection thresholds, together with a private distributed two-sample test for cumulative hazard functions.

If this is right

  • Privacy is statistically negligible once the privacy budget exceeds a threshold that scales with sample size and censoring rate.
  • In high-privacy regimes the testing rate is governed by the privacy noise rather than the usual statistical fluctuation.
  • Minimax lower bounds identify the precise regimes where further improvement is impossible.
  • Optimal private rates for some semiparametric survival models remain open.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same calibration technique could be adapted to other semiparametric models that rely on estimating equations.
  • Practical deployment would require checking how the added privacy noise interacts with heavy censoring or model misspecification.
  • The distributed two-sample construction suggests a route for private meta-analysis across hospitals without sharing raw records.

Load-bearing premise

The underlying survival data follows standard right-censored models such as the Cox proportional hazards model, and the private calibration procedure for rejection thresholds can be implemented without invalidating the finite-sample guarantees.

What would settle it

A Monte Carlo experiment on simulated right-censored data in which the private test maintains the nominal type-I error rate under the null while its power curve lies close to the non-private curve for moderate privacy budgets.

Figures

Figures reproduced from arXiv: 2605.16906 by Elly K. H. Hung, Yi Yu.

Figure 1
Figure 1. Figure 1: Proportion of rejections from applying the binary likelihood ratio test from Proposition [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Proportion of rejections from applying the score test from Corollary [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Proportion of rejections from applying the binary hypothesis test from Proposition [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Panel A shows the proportion of rejections from applying the two-sample test from [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Survival analysis is widely used in applications involving sensitive individual-level data, yet differentially private hypothesis testing for right-censored data remains largely undeveloped. We initiate a finite-sample theory of private hypothesis testing in survival analysis applications. For Cox regression coefficients, we develop private partial-likelihood-ratio and score-type tests, including a private calibration procedure for the rejection threshold. For cumulative hazard functions, we propose a private distributed two-sample test. Across these problems, we prove differential privacy and finite-sample testing guarantees, as well as minimax lower bounds. Our results identify when privacy is statistically negligible, when it dominates the testing rate, and where optimal private rates for testing in semiparametric survival models remain open. This theoretical analysis is accompanied by numerical experiments on simulated data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript initiates a finite-sample theory of differentially private hypothesis testing for right-censored survival data. For Cox regression coefficients it constructs private partial-likelihood-ratio and score-type tests together with a private calibration procedure that sets the rejection threshold. For cumulative hazard functions it proposes a private distributed two-sample test. Differential privacy, finite-sample type-I and type-II error bounds, and minimax lower bounds are proved for these procedures; regimes in which privacy is statistically negligible versus dominant are identified, and the theoretical results are accompanied by numerical experiments on simulated data.

Significance. If the finite-sample type-I error control holds, the work supplies the first rigorous private testing theory for semiparametric survival models, a setting that routinely handles sensitive medical data. The explicit identification of privacy-negligible and privacy-dominated regimes, together with matching lower bounds, gives practitioners concrete guidance. The proofs of differential privacy and the finite-sample guarantees constitute the primary technical contribution.

major comments (1)
  1. [§4.3] §4.3 (private calibration of the rejection threshold for the partial-likelihood-ratio test): the finite-sample type-I error bound is claimed to hold under random right-censoring, yet the proof does not explicitly compose the privacy noise (added to the observed information or to the quantile estimator) with the randomness of the censoring times and the resulting random observed information matrix. Because both the test statistic and its null distribution are functions of the realized censoring pattern, a detailed argument showing that the added privacy mechanism does not inflate the type-I error beyond the stated bound is required to support the central finite-sample guarantee.
minor comments (2)
  1. [§2.1] §2.1: the notation for the privacy budget (ε,δ) should be stated once at the beginning and used consistently; the current text alternates between (ε,0)-DP and (ε,δ)-DP without explicit justification for the choice in each theorem.
  2. [Table 1] Table 1: the column headers for the empirical type-I error rates do not indicate the number of Monte-Carlo replications; adding this information would allow readers to assess the precision of the reported frequencies.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript and for the constructive major comment on the finite-sample type-I error analysis. We address the point raised in §4.3 below.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (private calibration of the rejection threshold for the partial-likelihood-ratio test): the finite-sample type-I error bound is claimed to hold under random right-censoring, yet the proof does not explicitly compose the privacy noise (added to the observed information or to the quantile estimator) with the randomness of the censoring times and the resulting random observed information matrix. Because both the test statistic and its null distribution are functions of the realized censoring pattern, a detailed argument showing that the added privacy mechanism does not inflate the type-I error beyond the stated bound is required to support the central finite-sample guarantee.

    Authors: We appreciate the referee's request for an explicit composition argument. The proof in the manuscript proceeds by conditioning on the realized censoring times and the resulting observed information matrix: the private test statistic and the calibrated quantile are both constructed conditionally on this random matrix, so that the conditional type-I error is bounded by the target level for any fixed censoring pattern. The unconditional bound then follows directly by integrating the conditional bound with respect to the distribution of the censoring times (via the law of total probability). Because the privacy mechanism is applied after the censoring pattern is observed, it does not alter this conditional control. To make the argument fully transparent and to address the referee's concern directly, we will revise the manuscript by inserting a short clarifying paragraph (or subsection) that explicitly states the conditioning step, invokes the law of total probability, and confirms that the privacy noise does not inflate the unconditional type-I error beyond the stated finite-sample bound. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on independent proofs for private tests in censored models

full rationale

The paper develops private partial-likelihood-ratio and score-type tests plus a private calibration procedure for rejection thresholds in Cox models, along with a private distributed two-sample test for cumulative hazards. It claims to prove differential privacy, finite-sample type-I error control, and minimax lower bounds directly from the right-censored data model and standard DP mechanisms. No quoted step reduces a derived quantity to a fitted input by construction, invokes a self-citation as the sole justification for a uniqueness or ansatz claim, or renames a known empirical pattern. The derivation chain is therefore self-contained against external benchmarks such as classical Cox partial likelihood and DP composition theorems.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based on abstract only; limited visibility into specific parameters or assumptions beyond standard survival models.

free parameters (1)
  • privacy budget epsilon
    Differential privacy mechanisms require choosing or calibrating a privacy parameter that controls the noise level and may be tuned in the private calibration procedure.
axioms (1)
  • domain assumption Data follows right-censored survival model with Cox proportional hazards structure
    Tests are developed specifically for Cox regression coefficients and cumulative hazards under right-censoring.

pith-pipeline@v0.9.0 · 5650 in / 1212 out tokens · 37956 ms · 2026-05-19T19:09:47.699251+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 3 internal anchors

  1. [1]

    Differentially Private Estimation and Inference in High-Dimensional Regression with FDR Control

    Private estimation and inference in high-dimensional regression with fdr control , author=. arXiv preprint arXiv:2310.16260 , year=

  2. [2]

    2024 , publisher=

    Statistical inference , author=. 2024 , publisher=

  3. [3]

    arXiv preprint arXiv:2505.24811 , year=

    Locally Differentially Private Two-Sample Testing , author=. arXiv preprint arXiv:2505.24811 , year=

  4. [4]

    2011 , publisher=

    Survival analysis , author=. 2011 , publisher=

  5. [5]

    arXiv preprint arXiv:2508.04800 , year=

    Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform , author=. arXiv preprint arXiv:2508.04800 , year=

  6. [6]

    Private Approximations of the 2nd-Moment Matrix Using Existing Techniques in Linear Regression

    Private approximations of the 2nd-moment matrix using existing techniques in linear regression , author=. arXiv preprint arXiv:1507.00056 , year=

  7. [7]

    International Conference on Machine Learning , pages=

    Differentially private ordinary least squares , author=. International Conference on Machine Learning , pages=. 2017 , organization=

  8. [8]

    2012 IEEE 53rd annual symposium on foundations of computer science , pages=

    The johnson-lindenstrauss transform itself preserves differential privacy , author=. 2012 IEEE 53rd annual symposium on foundations of computer science , pages=. 2012 , organization=

  9. [9]

    The Annals of Statistics , volume=

    Differentially private inference via noisy optimization , author=. The Annals of Statistics , volume=. 2023 , publisher=

  10. [10]

    arXiv preprint arXiv:2402.07131 , year=

    Resampling methods for private statistical inference , author=. arXiv preprint arXiv:2402.07131 , year=

  11. [11]

    Andersen, P. K. and Gill, R. D. , title =. Annals of Statistics , year =

  12. [12]

    Journal of the American Statistical Association , volume=

    Privacy-Preserving Parametric Inference: A Case for Robust Statistics , author=. Journal of the American Statistical Association , volume=. 2020 , publisher=

  13. [13]

    Proceedings of the

    Convergence rates for differentially private statistical estimation , author=. Proceedings of the... International Conference on Machine Learning. International Conference on Machine Learning , volume=

  14. [14]

    The Structure of Optimal Private Tests for Simple Hypotheses , booktitle =

    Cl. The Structure of Optimal Private Tests for Simple Hypotheses , booktitle =. 2019 , doi =

  15. [15]

    Olivier Guilbaud , journal =. Exact

  16. [16]

    Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =

    Unified Transfer Learning in High-Dimensional Linear Regression , author =. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics , pages =. 2024 , editor =

  17. [17]

    Fleming and Judith R

    Thomas R. Fleming and Judith R. O'Fallon and Peter C. O'Brien and David P. Harrington , journal =. Modified Kolmogorov-Smirnov Test Procedures with Application to Arbitrarily Right-Censored Data , volume =

  18. [18]

    and Ying, Zhiliang and Yu, Y

    Huang, Jian and Sun, T. and Ying, Zhiliang and Yu, Y. and Zhang, Cun-Hui , journal=. Oracle inequalities for the

  19. [19]

    Elly K. H. Hung and Yi Yu , year=. Optimal

  20. [20]

    Finite Sample Differentially Private Confidence Intervals

    Finite sample differentially private confidence intervals , author=. arXiv preprint arXiv:1711.03908 , year=

  21. [21]

    2018 , eprint=

    Differentially Private Confidence Intervals for Empirical Risk Minimization , author=. 2018 , eprint=

  22. [22]

    Optimal federated learning for non- parametric regression with heterogeneous distributed differential privacy constraints.arXiv preprint arXiv:2406.06755,

    Optimal federated learning for nonparametric regression with heterogeneous distributed differential privacy constraints , author=. arXiv preprint arXiv:2406.06755 , year=

  23. [23]

    Tony Cai, Abhinav Chakraborty, and Lasse Vuursteen

    Federated nonparametric hypothesis testing with differential privacy constraints: Optimal rates and adaptive tests , author=. arXiv preprint arXiv:2406.06749 , year=

  24. [24]

    Advances in Neural Information Processing Systems , volume=

    Optimal private and communication constraint distributed goodness-of-fit testing for discrete distributions in the large sample regime , author=. Advances in Neural Information Processing Systems , volume=

  25. [25]

    Proceedings of Thirty Fifth Conference on Learning Theory , pages =

    Private High-Dimensional Hypothesis Testing , author =. Proceedings of Thirty Fifth Conference on Learning Theory , pages =. 2022 , editor =

  26. [26]

    Journal of Nonparametric Statistics , volume=

    Remember the curse of dimensionality: The case of goodness-of-fit testing in arbitrary dimension , author=. Journal of Nonparametric Statistics , volume=. 2018 , publisher=

  27. [27]

    Theory of Probability and Its Applications , volume =

    Bentkus, Vidmantas , title =. Theory of Probability and Its Applications , volume =. 2003 , doi =

  28. [28]

    2011 , eprint=

    A tail inequality for quadratic forms of subgaussian random vectors , author=. 2011 , eprint=

  29. [29]

    2020 , eprint=

    The Sample Complexity of Robust Covariance Testing , author=. 2020 , eprint=

  30. [30]

    High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^

    Arun Kumar Kuchibhotla and Alessandro Rinaldo , year=. High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^. 2009.13673 , archivePrefix=

  31. [31]

    arXiv preprint arXiv:2412.20542 , year=

    On the Missing Factor in Some Concentration Inequalities for Martingales , author=. arXiv preprint arXiv:2412.20542 , year=

  32. [32]

    International Conference on Machine Learning , pages=

    The test of tests: A framework for differentially private hypothesis testing , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  33. [33]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1972 , publisher=

  34. [34]

    Advances in Neural Information Processing Systems , volume=

    Differentially private testing of identity and closeness of discrete distributions , author=. Advances in Neural Information Processing Systems , volume=

  35. [35]

    arXiv preprint arXiv:2208.06803 , year=

    Differentially private hypothesis testing with the subsampled and aggregated randomized response mechanism , author=. arXiv preprint arXiv:2208.06803 , year=

  36. [36]

    Proceedings of the 5th Machine Learning for Healthcare Conference , pages =

    Differentially Private Survival Function Estimation , author =. Proceedings of the 5th Machine Learning for Healthcare Conference , pages =. 2020 , editor =

  37. [37]

    Advances in Neural Information Processing Systems , volume=

    Private hypothesis selection , author=. Advances in Neural Information Processing Systems , volume=

  38. [38]

    Differentially private

    Awan, Jordan and Wang, Yue , journal=. Differentially private. 2025 , publisher=

  39. [39]

    Alabi and Salil P

    Daniel G. Alabi and Salil P. Vadhan , title =. Journal of Machine Learning Research , year =

  40. [40]

    arXiv preprint arXiv:2310.19043 , year=

    Differentially private permutation tests: Applications to kernel methods , author=. arXiv preprint arXiv:2310.19043 , year=

  41. [41]

    Journal of Computational and Graphical Statistics , volume=

    Differentially Private Significance Tests for Regression Coefficients , author=. Journal of Computational and Graphical Statistics , volume=. 2019 , publisher=

  42. [42]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    When data can't meet: estimating correlation across privacy barriers , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  43. [43]

    Foundations and trends

    The algorithmic foundations of differential privacy , author=. Foundations and trends. 2014 , publisher=

  44. [44]

    Journal of the American Statistical Association , volume =

    Ilmun Kim and Antonin Schrab , title =. Journal of the American Statistical Association , year =. doi:10.1080/01621459.2025.2610033 , note =

  45. [45]

    2014 IEEE 55th annual symposium on foundations of computer science , pages=

    Private empirical risk minimization: Efficient algorithms and tight error bounds , author=. 2014 IEEE 55th annual symposium on foundations of computer science , pages=. 2014 , organization=

  46. [46]

    The Annals of Statistics , volume=

    The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy , author=. The Annals of Statistics , volume=. 2021 , publisher=

  47. [47]

    Proceedings of the 34th International Conference on Machine Learning , series =

    Priv’IT: Private and Sample Efficient Identity Testing , author =. Proceedings of the 34th International Conference on Machine Learning , series =. 2017 , publisher =

  48. [48]

    Advances in Neural Information Processing Systems , volume=

    Differentially private uniformly most powerful tests for binomial data , author=. Advances in Neural Information Processing Systems , volume=

  49. [49]

    Annals of Data Science , volume=

    A survey on differential privacy for medical data analysis , author=. Annals of Data Science , volume=. 2024 , publisher=

  50. [50]

    Differentially Private Regression for Discrete-Time Survival Analysis , year =

    Nguy\^. Differentially Private Regression for Discrete-Time Survival Analysis , year =. doi:10.1145/3132847.3132928 , booktitle =

  51. [51]

    Local differential privacy in survival analysis using private failure indicators , journal =

    Maxime Eg. Local differential privacy in survival analysis using private failure indicators , journal =. 2025 , doi =

  52. [52]

    IEEE Symposium on Security and Privacy (SP) , pages =

    Robust de-anonymization of large sparse datasets , author =. IEEE Symposium on Security and Privacy (SP) , pages =. 2008 , organization =

  53. [53]

    Nature Communications , volume =

    Estimating the success of re-identifications in incomplete datasets using generative models , author =. Nature Communications , volume =. 2019 , publisher =

  54. [54]

    Proceedings of the 33rd International Conference on Automata, Languages and Programming (ICALP 2006) , pages =

    Cynthia Dwork , title =. Proceedings of the 33rd International Conference on Automata, Languages and Programming (ICALP 2006) , pages =. 2006 , publisher =

  55. [55]

    Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =

    Parametric Bootstrap for Differentially Private Confidence Intervals , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , pages =. 2022 , editor =

  56. [56]

    Proceedings of the 33rd International Conference on Machine Learning (ICML) , pages =

    Marco Gaboardi and Hyun Lim and Ryan Rogers and Salil Vadhan , title =. Proceedings of the 33rd International Conference on Machine Learning (ICML) , pages =. 2016 , publisher =

  57. [57]

    Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

    Ryan Rogers and Daniel Kifer , title =. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) , series =. 2017 , publisher =

  58. [58]

    Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security , pages =

    Sara Couch and Zeyu Kazan and Kai Shi and Andrew Bray and Alex Groce , title =. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security , pages =. 2019 , publisher =

  59. [59]

    2018 1st International Conference on Data Intelligence and Security (ICDIS) , pages =

    Zack Campbell and Andrew Bray and Anna Ritz and Alex Groce , title =. 2018 1st International Conference on Data Intelligence and Security (ICDIS) , pages =. 2018 , publisher =

  60. [60]

    Journal of Machine Learning Research , volume=

    Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies , author=. Journal of Machine Learning Research , volume=

  61. [61]

    2019 , publisher =

    High-Dimensional Statistics: A Non-Asymptotic Viewpoint , author =. 2019 , publisher =

  62. [62]

    1993 , publisher=

    Statistical Models Based on Counting Processes , author=. 1993 , publisher=

  63. [63]

    Journal of Interactive Marketing , volume=

    How to project customer retention , author=. Journal of Interactive Marketing , volume=. 2007 , publisher=

  64. [64]

    2021 , publisher=

    Statistical methods for reliability data , author=. 2021 , publisher=

  65. [65]

    Therneau and Patricia M

    Terry M. Therneau and Patricia M. Grambsch , title =. 2000 , publisher =

  66. [66]

    Journal of Machine Learning Research , year =

    Jongmin Mun and Seungwoo Kwak and Ilmun Kim , title =. Journal of Machine Learning Research , year =

  67. [67]

    Proceedings on Privacy Enhancing Technologies , volume =

    Josh Smith and Hassan Jameel Asghar and Gianpaolo Gioiosa and Sirine Mrabet and Serge Gaspers and Paul Tyler , title =. Proceedings on Privacy Enhancing Technologies , volume =. 2022 , doi =

  68. [68]

    1990 , publisher =

    U-Statistics: Theory and Practice , author =. 1990 , publisher =. doi:10.1201/9780203734520 , isbn =