pith. machine review for the scientific record. sign in

arxiv: 2605.10088 · v1 · submitted 2026-05-11 · 📊 stat.ME

Recognition: 2 theorem links

· Lean Theorem

Sample size and power calculations for causal inference with time-to-event outcomes

Bo Liu, Chengxin Yang, Fan Li

Pith reviewed 2026-05-12 03:30 UTC · model grok-4.3

classification 📊 stat.ME
keywords sample size calculationpower analysiscausal inferencetime-to-event datamarginal hazard ratioinverse probability weightingCox proportional hazards
0
0 comments X

The pith

A new sample size formula for marginal hazard ratios in time-to-event causal inference applies to both randomized trials and observational studies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper derives power and sample size formulas for estimating the marginal hazard ratio under a marginal structural Cox model with time-to-event outcomes. It extends robust sandwich variance theory to obtain the analytical asymptotic variance of the inverse probability weighted partial likelihood estimator. The resulting formula works for randomized trials using only the treatment proportion, effect size, and event rate, while observational studies require one additional overlap coefficient that captures covariate similarity between groups. The work also supplies a variance inflation approach for any propensity score balancing weights and corrects mischaracterizations in classic log-rank formulas.

Core claim

The authors derive the asymptotic variance of the inverse probability weighted partial likelihood estimator for the marginal hazard ratio and obtain a new sample size formula that remains valid at any prespecified effect size, requiring only treatment proportion, effect size, and event rate for randomized trials plus an overlap coefficient for observational studies.

What carries the argument

The inverse probability weighted partial likelihood estimator for the marginal structural Cox proportional hazards model with treatment as sole predictor, together with its robust sandwich variance expression.

If this is right

  • For randomized trials the required sample size depends only on treatment proportion, effect size, and event rate.
  • For observational studies the formula incorporates one extra overlap coefficient summarizing covariate balance between groups.
  • A variance inflation factor can be layered onto the baseline variance for any choice of propensity score balancing weights.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of the formula would let investigators plan survival studies with explicit causal targets without defaulting to log-rank approximations.
  • Routine reporting of the overlap coefficient could become a standard diagnostic when sample size is justified in observational survival analyses.

Load-bearing premise

The marginal structural Cox model satisfies the proportional hazards assumption and the regularity conditions for asymptotic normality of the IPW estimator hold.

What would settle it

A Monte Carlo simulation that generates data under a known marginal structural Cox model with proportional hazards, applies the IPW estimator, and checks whether the empirical rejection rate matches the power predicted by the new formula at the calculated sample size.

read the original abstract

This paper develops power and sample size formulas for causal inference with time-to-event outcomes. The target estimand is the marginal hazard ratio: the coefficient of a marginal structural Cox proportional hazard model with treatment as the only predictor. We extend the robust sandwich variance theory and derive the analytical form of the asymptotic variance for the inverse probability weighted partial likelihood estimator. Building on this, we derive a new sample size formula valid at any prespecified effect size, applicable to both randomized trials and observational studies. For randomized trials, the formula requires only the canonical inputs of treatment proportion, effect size, and event rate. The new formula corrects the mischaracterization of classic log-rank-based formulas. For observational studies, one additional input suffices: an overlap coefficient summarizing covariate similarity between comparison groups. We further develop a variance inflation approach applicable to any propensity score balancing weights, anchored to the corrected baseline variance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops analytical power and sample size formulas for causal inference targeting the marginal hazard ratio, defined as the coefficient in a marginal structural Cox proportional hazards model with treatment as the sole predictor. It extends robust sandwich variance theory to obtain the asymptotic variance of the inverse-probability-weighted partial-likelihood estimator, then derives a closed-form sample-size expression that, for randomized trials, requires only treatment proportion, effect size, and event rate; for observational studies it adds an overlap coefficient. A variance-inflation approach is also given for general propensity-score balancing weights. The work claims to correct mischaracterizations present in classic log-rank-based formulas.

Significance. If the derivations are correct, the manuscript supplies a practical, low-input tool for study planning in causal survival analysis that applies equally to randomized and observational settings. The analytical (rather than simulation-based) form of the asymptotic variance and the explicit reduction to canonical RCT inputs are notable strengths, as is the anchoring of the observational formula to a single, estimable overlap summary. These features would make the formulas directly usable by applied researchers without requiring fitted models or extensive Monte Carlo work.

major comments (2)
  1. [Target estimand and assumptions] Target estimand section: the central claim rests on the existence of a constant marginal hazard ratio in the structural Cox model. The manuscript states this proportional-hazards assumption but supplies no analytic conditions (e.g., absence of treatment-covariate interactions on the hazard scale or null covariate main effects) under which marginalization preserves proportionality when a conditional Cox model holds. Because both the variance derivation and the subsequent power formula presuppose this marginal PH property, the omission limits the scope of the result and requires either explicit conditions or a sensitivity discussion.
  2. [Sample-size formula] Derivation of the sample-size formula (RCT case): the claim that the formula is 'valid at any prespecified effect size' and requires only the three canonical inputs is load-bearing for the paper's practical contribution. The explicit algebraic form, the handling of the event rate under a non-null hazard ratio, and the precise manner in which the new expression differs from (and corrects) the classic Schoenfeld or Freedman log-rank formulas should be displayed with the relevant equations so that readers can verify the correction.
minor comments (2)
  1. [Abstract] The abstract asserts that the new formula 'corrects the mischaracterization of classic log-rank-based formulas' without naming the specific formulas or the nature of the mischaracterization; a brief parenthetical reference in the abstract would improve clarity.
  2. [Observational-study extension] Notation for the overlap coefficient is introduced without an explicit formula or estimation procedure in the summary sections; adding the definition (e.g., as a function of the propensity-score distribution) would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and describe the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Target estimand and assumptions] Target estimand section: the central claim rests on the existence of a constant marginal hazard ratio in the structural Cox model. The manuscript states this proportional-hazards assumption but supplies no analytic conditions (e.g., absence of treatment-covariate interactions on the hazard scale or null covariate main effects) under which marginalization preserves proportionality when a conditional Cox model holds. Because both the variance derivation and the subsequent power formula presuppose this marginal PH property, the omission limits the scope of the result and requires either explicit conditions or a sensitivity discussion.

    Authors: We agree that explicit conditions for the marginal proportional hazards property are needed to delineate the scope of the results. In the revised manuscript we will add a new subsection to the Target Estimand section that states the analytic conditions under which marginalization preserves proportionality (absence of treatment-by-covariate interactions on the hazard scale together with the requirement that covariate main effects do not induce time dependence in the marginal hazard). We will also include a short sensitivity paragraph noting that, when these conditions fail, the marginal hazard ratio is interpretable as a time-averaged effect and our formulas remain a useful approximation. This addition directly addresses the referee’s concern while keeping the focus on the marginal estimand. revision: partial

  2. Referee: [Sample-size formula] Derivation of the sample-size formula (RCT case): the claim that the formula is 'valid at any prespecified effect size' and requires only the three canonical inputs is load-bearing for the paper's practical contribution. The explicit algebraic form, the handling of the event rate under a non-null hazard ratio, and the precise manner in which the new expression differs from (and corrects) the classic Schoenfeld or Freedman log-rank formulas should be displayed with the relevant equations so that readers can verify the correction.

    Authors: We appreciate the request for greater explicitness. In the revised Sample Size Formula section we will present the closed-form asymptotic variance expression, including the integral term that incorporates the event rate under a non-null hazard ratio. We will also add a side-by-side algebraic comparison with the Schoenfeld and Freedman formulas, highlighting the precise points at which the classic expressions omit the non-null adjustment and the inverse-probability weighting correction. These displayed equations will confirm that the new formula depends only on treatment proportion, effect size, and event rate for randomized trials, thereby making the claimed correction transparent to readers. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper extends standard robust sandwich variance theory to derive the analytical asymptotic variance for the IPW partial likelihood estimator of the marginal structural Cox model coefficient. The sample size and power formulas are then constructed directly from this variance expression using conventional power calculation methods. Required inputs such as treatment proportion, effect size, event rate, and overlap coefficient are external data summaries or prespecified parameters, not outputs of the same fitted model or self-referential quantities. No load-bearing steps reduce by construction to self-citations, fitted inputs renamed as predictions, or ansatzes; the central derivation remains independent and self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central derivations rest on standard regularity conditions for asymptotic normality of IPW estimators and the proportional hazards assumption for the marginal structural model; no new free parameters are introduced beyond user-specified inputs such as event rate and overlap coefficient.

axioms (2)
  • domain assumption The marginal structural Cox model satisfies the proportional hazards assumption
    Invoked to justify the target estimand as a constant marginal hazard ratio.
  • standard math Standard regularity conditions hold for the asymptotic normality of the IPW partial likelihood estimator
    Required for the sandwich variance derivation to be valid.

pith-pipeline@v0.9.0 · 5446 in / 1440 out tokens · 48155 ms · 2026-05-12T03:30:13.632578+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

125 extracted references · 125 canonical work pages

  1. [1]

    Statistical methods in medical research , volume=

    The performance of inverse probability of treatment weighting and full matching on the propensity score in the presence of model misspecification when estimating the effect of treatment on survival outcomes , author=. Statistical methods in medical research , volume=. 2017 , publisher=

  2. [2]

    Rockova and E

    Li, F. and Morgan, K. L. and Zaslavsky, A. M. , title =. Journal of the American Statistical Association , volume =. doi:10.1080/01621459.2016.1260466 , year =

  3. [3]

    American journal of epidemiology , volume=

    Addressing extreme propensity scores via the overlap weights , author=. American journal of epidemiology , volume=. 2019 , publisher=

  4. [4]

    Journal of the American Statistical Association , volume=

    Randomization analysis of experimental data: The Fisher randomization test comment , author=. Journal of the American Statistical Association , volume=. 1980 , publisher=

  5. [5]

    1983 , JOURNAL =

    Rosenbaum, P R and Rubin, D B , TITLE =. 1983 , JOURNAL =

  6. [6]

    Epidemiology , volume=

    The hazards of hazard ratios , author=. Epidemiology , volume=. 2010 , publisher=

  7. [7]

    Statistics in medicine , volume=

    Power analysis for multivariable Cox regression models , author=. Statistics in medicine , volume=. 2019 , publisher=

  8. [8]

    Statistics in Medicine , volume=

    Informing power and sample size calculations when using inverse probability of treatment weighting using the propensity score , author=. Statistics in Medicine , volume=. 2021 , publisher=

  9. [9]

    Journal of computational biology , volume=

    Estimating dataset size requirements for classifying DNA microarray data , author=. Journal of computational biology , volume=. 2003 , publisher=

  10. [10]

    Biometrics , volume=

    Power and sample size for observational studies of point exposure effects , author=. Biometrics , volume=. 2022 , publisher=

  11. [11]

    Statistics in Medicine , volume=

    Sample size calculation for randomized trials via inverse probability of response weighting when outcome data are missing at random , author=. Statistics in Medicine , volume=. 2023 , publisher=

  12. [12]

    Aitchison and S

    J. Aitchison and S. M. Shen , journal =. Logistic-Normal Distributions: Some Properties and Uses , urldate =

  13. [13]

    and Littleword, J.E

    Hardy, G.H. and Littleword, J.E. and P. Inequalities , publisher =

  14. [14]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1972 , publisher=

  15. [15]

    Econometrica , volume=

    Efficient estimation of average treatment effects using the estimated propensity score , author=. Econometrica , volume=. 2003 , publisher=

  16. [16]

    1995 , publisher=

    Probability and measure , author=. 1995 , publisher=

  17. [17]

    Statistics in medicine , volume=

    Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , author=. Statistics in medicine , volume=. 2004 , publisher=

  18. [18]

    Bulletin of the Calcutta Mathematical Society , volume=

    On a measure of divergence between two statistical populations defined by their probability distribution , author=. Bulletin of the Calcutta Mathematical Society , volume=

  19. [19]

    Journal of Educational and Behavioral Statistics , volume=

    Statistical power for causally defined indirect effects in group-randomized trials with individual-level mediators , author=. Journal of Educational and Behavioral Statistics , volume=. 2017 , publisher=

  20. [20]

    Biometrika , volume=

    Dealing with limited overlap in estimation of average treatment effects , author=. Biometrika , volume=. 2009 , publisher=

  21. [21]

    Behavior Research Methods , volume=

    Sample size and power calculations for causal mediation analysis: a tutorial and shiny app , author=. Behavior Research Methods , volume=. 2024 , publisher=

  22. [22]

    1996 , JOURNAL =

    Connors, A F and Speroff, T and Dawson, N V and Thomas, C and Harrell, F E and Wagner, D and Desbiens, N and Goldman, L and Wu, A W and Califf, R M and Fulkerson, W J and Vidaillet, H and Broste, S and Bellamy, P and Lynn, J and Knaus, W A , TITLE =. 1996 , JOURNAL =

  23. [23]

    Statistics in medicine , volume=

    Propensity score weighting for covariate adjustment in randomized clinical trials , author=. Statistics in medicine , volume=. 2021 , publisher=

  24. [24]

    Statistics in medicine , volume=

    Inverse probability weighting for covariate adjustment in randomized studies , author=. Statistics in medicine , volume=. 2014 , publisher=

  25. [25]

    Annals of Statistics , pages=

    Sample size and power calculations for causal inference in observational studies , author=. Annals of Statistics , pages=

  26. [26]

    Statistics in medicine , volume=

    On the propensity score weighting analysis with survival outcome: Estimands, estimation, and inference , author=. Statistics in medicine , volume=. 2018 , publisher=

  27. [27]

    Biometrika , volume=

    The asymptotic properties of nonparametric tests for comparing survival distributions , author=. Biometrika , volume=. 1981 , doi=

  28. [28]

    Biometrics , pages=

    Sample-size formula for the proportional-hazards regression model , author=. Biometrics , pages=. 1983 , publisher=

  29. [29]

    Controlled Clinical Trials , volume=

    Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates , author=. Controlled Clinical Trials , volume=. 2000 , doi=

  30. [30]

    Biometrics , volume=

    Tables of the number of patients required in clinical trials using the log-rank test , author=. Biometrics , volume=. 1982 , publisher=

  31. [31]

    Biometrics , volume=

    Variance estimation in inverse probability weighted Cox models , author=. Biometrics , volume=. 2021 , publisher=

  32. [32]

    Evaluation & the Health Professions , volume=

    An overview of variance inflation factors for sample-size calculation , author=. Evaluation & the Health Professions , volume=. 2003 , publisher=

  33. [33]

    Journal of the American statistical Association , volume=

    The robust inference for the Cox proportional hazards model , author=. Journal of the American statistical Association , volume=. 1989 , publisher=

  34. [34]

    Powell and James H

    James L. Powell and James H. Stock and Thomas M. Stoker , journal =. Semiparametric Estimation of Index Coefficients , urldate =

  35. [35]

    Biometrical Journal , volume=

    Closed-form variance estimator for weighted propensity score estimators with survival outcome , author=. Biometrical Journal , volume=. 2018 , publisher=

  36. [36]

    1981 , issn =

    Introduction to sample size determination and power analysis for clinical trials , journal =. 1981 , issn =. doi:https://doi.org/10.1016/0197-2456(81)90001-5 , author =

  37. [37]

    Biometrics , pages=

    Evaluation of sample size and power for analyses of survival with allowance for nonuniform patient entry, losses to follow-up, noncompliance, and stratification , author=. Biometrics , pages=. 1986 , publisher=

  38. [38]

    Statistics in medicine , volume=

    Sample size and power for a logrank test and Cox proportional hazards model with multiple groups and strata, or a quantitative covariate with multiple strata , author=. Statistics in medicine , volume=. 2013 , publisher=

  39. [39]

    Clinical Trials , volume=

    Causal interpretation of the hazard ratio in randomized clinical trials , author=. Clinical Trials , volume=. 2024 , publisher=

  40. [40]

    2020 , publisher=

    Causal inference: What if , author=. 2020 , publisher=

  41. [41]

    1965 , publisher=

    Survey sampling , author=. 1965 , publisher=

  42. [42]

    2017 , publisher=

    Sample size calculations in clinical research , author=. 2017 , publisher=

  43. [43]

    Biometrika , volume=

    Partial likelihood , author=. Biometrika , volume=. 1975 , publisher=

  44. [44]

    Binder , journal =

    David A. Binder , journal =. On the Variances of Asymptotically Normal Estimators from Complex Surveys , urldate =

  45. [45]

    Binder , journal =

    David A. Binder , journal =. Fitting Cox's Proportional Hazards Models from Survey Data , urldate =

  46. [46]

    1997 , publisher=

    Counting processes and survival analysis , author=. 1997 , publisher=

  47. [47]

    Biometrika , volume =

    PAIK, MYUNGHEE CHO and TSAI, WEI-YANN , title =. Biometrika , volume =. 1997 , month =

  48. [48]

    Lin, D. Y. and Wei, L. J. and Ying, Zhiliang , title =. Biometrika , volume =. 1998 , month =

  49. [49]

    C. A. Struthers and J. D. Kalbfleisch , journal =. Misspecified Proportional Hazard Models , urldate =

  50. [50]

    Epidemiology , volume=

    Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men , author=. Epidemiology , volume=

  51. [51]

    Statistical methods in medical research , volume=

    Propensity score weighting under limited overlap and model misspecification , author=. Statistical methods in medical research , volume=. 2020 , publisher=

  52. [52]

    Biometrics , pages=

    Sample sizes based on the log-rank statistic in complex clinical trials , author=. Biometrics , pages=. 1988 , publisher=

  53. [53]

    2006 , publisher=

    Semiparametric theory and missing data , author=. 2006 , publisher=

  54. [54]

    Lifetime Data Analysis , volume=

    Subtleties in the interpretation of hazard contrasts , author=. Lifetime Data Analysis , volume=. 2020 , publisher=

  55. [55]

    1980 , publisher=

    Approximation theorems of mathematical statistics , author=. 1980 , publisher=

  56. [56]

    P. K. Andersen and R. D. Gill , journal =. Cox's Regression Model for Counting Processes: A Large Sample Study , urldate =

  57. [57]

    Journal of Official Statistics , volume=

    Weighting for unequal Pi , author=. Journal of Official Statistics , volume=. 1992 , publisher=

  58. [58]

    Journal of official Statistics , volume=

    Methods for design effects , author=. Journal of official Statistics , volume=. 1995 , publisher=

  59. [59]

    Vaart, A. W. van der , year=. Asymptotic Statistics , publisher=

  60. [60]

    Biometrika , volume =

    Lin, DY , title =. Biometrika , volume =. 2000 , month =

  61. [61]

    Lifetime data analysis , volume=

    Exposure stratified case-cohort designs , author=. Lifetime data analysis , volume=. 2000 , publisher=

  62. [62]

    Lu Tian and David Zucker and L. J. Wei , journal =. On the Cox Model with Time-Varying Regression Coefficients , urldate =

  63. [63]

    and Wellner, Jon A

    Breslow, Norman E. and Wellner, Jon A. , title =. Scandinavian Journal of Statistics , volume =. doi:https://doi.org/10.1111/j.1467-9469.2006.00523.x , year =

  64. [64]

    , title =

    Zhang, Min and Schaubel, Douglas E. , title =. Biometrics , volume =. doi:https://doi.org/10.1111/j.1541-0420.2012.01759.x , year =

  65. [65]

    BMC medical research methodology , volume=

    Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome , author=. BMC medical research methodology , volume=. 2013 , publisher=

  66. [66]

    Biostatistics , volume =

    Xu, Ronghui and O’Quigley, John , title =. Biostatistics , volume =. 2000 , month =

  67. [67]

    2004 , issn =

    Adjusted survival curves with inverse probability weights , journal =. 2004 , issn =. doi:https://doi.org/10.1016/j.cmpb.2003.10.004 , author =

  68. [68]

    A Paradox concerning Nuisance Parameters and Projected Estimating Functions , urldate =

    Masayuki Henmi and Shinto Eguchi , journal =. A Paradox concerning Nuisance Parameters and Projected Estimating Functions , urldate =

  69. [69]

    Epidemiology , volume=

    Marginal structural models and causal inference in epidemiology , author=. Epidemiology , volume=. 2000 , publisher=

  70. [70]

    Statistics in medicine , volume=

    Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis , author=. Statistics in medicine , volume=. 2016 , publisher=

  71. [71]

    and Mayo, Matthew S

    Phadnis, Milind A. and Mayo, Matthew S. , title =. Biometrical Journal , volume =. doi:https://doi.org/10.1002/bimj.202000043 , year =

  72. [72]

    Statistical Method in Medical Research , volume=

    Asymptotic validity of Schoenfeld’s sample size formula for the Cox proportional hazards model via theWald test approach , author=. Statistical Method in Medical Research , volume=

  73. [73]

    American Journal of Epidemiology , volume =

    Cheng, Chao and Li, Fan and Thomas, Laine E and Li, Fan , title =. American Journal of Epidemiology , volume =. 2022 , month =

  74. [74]

    Annals of Internal Medicine , volume =

    Fluorouracil plus Levamisole as Effective Adjuvant Therapy after Resection of Stage III Colon Carcinoma: A Final Report , author=. Annals of Internal Medicine , volume =. 1995 , doi =

  75. [75]

    Moertel and Thomas R

    Charles G. Moertel and Thomas R. Fleming and John S. Macdonald and Daniel G. Haller and John A. Laurie and Phyllis J. Goodman and James S. Ungerleider and William A. Emerson and Douglas C. Tormey and John H. Glick and Michael H. Veeder and James A. Mailliard , title =. New England Journal of Medicine , volume =

  76. [76]

    Health Services and Outcomes research methodology , volume=

    Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization , author=. Health Services and Outcomes research methodology , volume=. 2001 , publisher=

  77. [77]

    American Journal of Epidemiology , volume=

    Using big data to emulate a target trial when a randomized trial is not available , author=. American Journal of Epidemiology , volume=. 2016 , publisher=. doi:10.1093/aje/kwv254 , url=

  78. [78]

    Annals of internal medicine , volume=

    The target trial framework for causal inference from observational data: why and when is it helpful? , author=. Annals of internal medicine , volume=. 2025 , publisher=

  79. [80]

    Andersen, P. K. and R. D. Gill (1982). Cox's regression model for counting processes: A large sample study. The Annals of Statistics\/ 10\/ (4), 1100--1120

  80. [81]

    Austin, P. C. (2021). Informing power and sample size calculations when using inverse probability of treatment weighting using the propensity score. Statistics in Medicine\/ 40\/ (27), 6150--6163

Showing first 80 references.