pith. machine review for the scientific record. sign in

arxiv: 2604.10398 · v1 · submitted 2026-04-12 · 📊 stat.ME · stat.ML

Recognition: unknown

Estimating heterogeneous treatment effects with survival outcomes via a deep survival learner

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:35 UTC · model grok-4.3

classification 📊 stat.ME stat.ML
keywords heterogeneous treatment effectssurvival analysisright censoringdeep learningdoubly robustconditional average treatment effecttime-varying effects
0
0 comments X

The pith

A deep survival learner estimates time-specific heterogeneous treatment effects in right-censored data using doubly robust pseudo-outcomes and joint neural network fitting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a method to estimate how a treatment's effect on survival varies across individuals and changes over the course of follow-up, despite some patients being censored before the event. It does so by creating a doubly robust pseudo-outcome that correctly identifies the conditional average treatment effect at each time if either the survival model or the treatment probability model is right. A multi-output deep neural network then learns the full trajectory of these effects at once by sharing representations across times. Theory shows this joint approach keeps approximation error low when effects are smooth over time while reducing variance compared to fitting each time separately. The approach is useful because single-time analyses ignore when effects might strengthen or weaken, and it performs well even when models are partly wrong, as shown in simulations and a lung cancer application.

Core claim

The DSL method constructs a doubly robust pseudo-outcome for the time-specific CATE that accounts for right censoring and remains unbiased if either the outcome model or the treatment assignment model is correctly specified. Estimation proceeds via a multi-output deep neural network with shared representations that jointly estimates the CATE function over a spectrum of times. Error bounds establish that joint estimation over time controls estimation error by leveraging temporal structure under smoothness conditions, without substantial extra approximation cost relative to separate estimation at each time point. Cross-fitting is used to mitigate bias from estimating the nuisance functions.

What carries the argument

Doubly robust pseudo-outcome for time-specific CATE, estimated jointly via multi-output deep neural network with shared layers.

If this is right

  • Clinicians can obtain full trajectories of treatment effect heterogeneity rather than estimates at isolated times.
  • The estimator stays consistent for the CATE even under misspecification of one nuisance function.
  • Joint estimation over time yields more stable results than pointwise estimation when effects vary smoothly.
  • Cross-fitting reduces overfitting bias in the presence of flexible nuisance estimators.
  • Applied to real data, it can uncover patient subgroups with differing time-dependent benefits from treatment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The smoothness-based stability gain suggests that similar joint modeling could improve other longitudinal or time-to-event estimators.
  • This framework might be adapted to handle competing risks or time-varying treatments by extending the pseudo-outcome construction.
  • Practitioners could use the estimated trajectories to time interventions or monitor when benefits accrue differently across patients.

Load-bearing premise

At least one of the outcome or treatment assignment models is correctly specified, and the heterogeneous treatment effects satisfy smoothness conditions over time.

What would settle it

If both the outcome regression and treatment assignment models are misspecified, the estimated CATEs should show substantial bias in finite samples; alternatively, when treatment effects change abruptly over time, the joint estimator should not show stability improvements over separate time-point fits.

Figures

Figures reproduced from arXiv: 2604.10398 by Jian Kang, Yi Li, Yuming Sun.

Figure 1
Figure 1. Figure 1: Kaplan–Meier survival curves by treatment group. We estimate CATEs at 10 prespecified time points ranging from 1 to 23 years after surgery. At each time point t, the CATE is defined as the difference in survival probabilities at time t between surgery plus perioperative chemotherapy and surgery alone, conditional on pre-treatment covariates. In our implementation, DSL employs a three-layer feedforward netw… view at source ↗
Figure 2
Figure 2. Figure 2: Estimated conditional average treatment effects across key baseline covariates. Each panel displays the estimated CATE as a function of one covariate of interest (gender, tumor stage, age, body mass index, or smoking intensity), with all remaining covariates fixed at their sample means or modal values. 6 Discussion We propose the Deep Survival Learner (DSL), a causal deep learning framework for estimating … view at source ↗
read the original abstract

Estimating heterogeneous treatment effects in survival settings is complicated by right censoring as well as the time-varying nature of the estimand. While the conditional average treatment effect (CATE) provides a natural target, most existing approaches focus on a single prespecified time point and do not account for the temporal trajectory, leading to instability in estimation. We propose a deep survival learner (DSL) for estimating heterogeneous treatment effects with right-censored outcomes. The method is based on a doubly robust pseudo-outcome whose conditional expectation identifies time-specific CATEs under standard assumptions. This construction remains unbiased if either the outcome model or the treatment assignment model is correctly specified, when properly accounting for censoring. To estimate CATEs over a clinically relevant time spectrum, DSL employs a multi-output deep neural network with shared representations, enabling joint estimation of treatment effect trajectories. From a theoretical perspective, we derive error bounds for both pointwise and joint estimation over time. We show that joint estimation can leverage temporal structure to control estimation error without incurring much additional approximation cost under smoothness conditions, leading to improved stability relative to separate estimation. Cross-fitting is incorporated to reduce overfitting and mitigate bias arising from flexible nuisance estimation. Simulation studies demonstrate favorable finite-sample performance, particularly under nuisance model misspecification. Applied to the Boston Lung Cancer Study, DSL reveals heterogeneity in the effects of perioperative chemotherapy across patient characteristics and over time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a deep survival learner (DSL) for estimating heterogeneous treatment effects with right-censored survival outcomes. It constructs a doubly robust pseudo-outcome whose conditional expectation identifies time-specific CATEs, employs a multi-output deep neural network with shared representations for joint estimation of treatment effect trajectories over time, derives error bounds for pointwise and joint estimation, incorporates cross-fitting to mitigate bias from flexible nuisance estimation, and reports favorable finite-sample performance in simulations under nuisance misspecification along with an application to the Boston Lung Cancer Study.

Significance. If the doubly robust property holds after proper accounting for censoring and the error bounds confirm stability gains from joint estimation under smoothness, the work would advance methods for time-varying HTE estimation in survival data by combining DR identification with deep learning for trajectory estimation. The simulation results under misspecification and the clinical application provide practical support for the approach.

major comments (2)
  1. [Abstract] Abstract: The central claim that the pseudo-outcome 'remains unbiased if either the outcome model or the treatment assignment model is correctly specified, when properly accounting for censoring' is load-bearing for the unbiasedness result. It is unclear whether the construction is only doubly robust or additionally requires a correctly specified censoring distribution G(t|X,A) (as is standard for IPCW or augmented IPCW in right-censored data). If the pseudo-outcome augments only the outcome and propensity while treating censoring separately, unbiasedness would fail under censoring misspecification even if one of the other two models is correct. This requires explicit clarification in the identification argument.
  2. [Theoretical results] Theoretical analysis: The claim that joint estimation 'can leverage temporal structure to control estimation error without incurring much additional approximation cost under smoothness conditions' is load-bearing for the stability advantage over separate estimation. The specific smoothness assumptions, the form of the error bounds (pointwise vs. integrated over time), and the derivation showing no extra approximation cost need to be verified to confirm the result holds.
minor comments (2)
  1. [Abstract] The abstract mentions incorporation of cross-fitting but does not detail its implementation within the multi-output neural network architecture or how it interacts with the shared representations.
  2. [Simulations] Simulation studies are described as showing favorable performance, but the abstract lacks specifics on the metrics (e.g., MSE, coverage), data-generating processes, or how misspecification was induced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight two important areas requiring clarification: the precise scope of the doubly robust property under censoring, and the explicit conditions and derivations supporting the joint estimation advantage. We have revised the manuscript to address both points directly, adding explicit statements in the identification section, updating the abstract, and expanding the theoretical results with clearer assumptions and bound comparisons. We believe these changes strengthen the paper without altering its core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the pseudo-outcome 'remains unbiased if either the outcome model or the treatment assignment model is correctly specified, when properly accounting for censoring' is load-bearing for the unbiasedness result. It is unclear whether the construction is only doubly robust or additionally requires a correctly specified censoring distribution G(t|X,A) (as is standard for IPCW or augmented IPCW in right-censored data). If the pseudo-outcome augments only the outcome and propensity while treating censoring separately, unbiasedness would fail under censoring misspecification even if one of the other two models is correct. This requires explicit clarification in the identification argument.

    Authors: We agree that the original wording in the abstract was insufficiently precise. The pseudo-outcome is constructed via an augmented inverse-probability-of-censoring-weighted (AIPCW) estimator. It is doubly robust with respect to the outcome regression and propensity score (i.e., remains unbiased if either is correctly specified), but the censoring survival function G(t|X,A) must be consistently estimated for the weights to be valid. We have revised Section 2.2 to state the identification assumptions explicitly, including that the censoring model is estimated separately via a correctly specified Cox model or nonparametric estimator, and we have updated the abstract to read: 'This construction remains unbiased if either the outcome model or the treatment assignment model is correctly specified, provided the censoring distribution is consistently estimated.' A short remark has also been added noting that full triple robustness would require an additional augmentation term for the censoring model, which is left for future work. revision: yes

  2. Referee: [Theoretical results] Theoretical analysis: The claim that joint estimation 'can leverage temporal structure to control estimation error without incurring much additional approximation cost under smoothness conditions' is load-bearing for the stability advantage over separate estimation. The specific smoothness assumptions, the form of the error bounds (pointwise vs. integrated over time), and the derivation showing no extra approximation cost need to be verified to confirm the result holds.

    Authors: We appreciate the request for greater transparency. The smoothness assumption is that the time-varying CATE function τ(t,x) belongs to a Hölder class of order α > 1/2 uniformly in t (Assumption 4). Theorem 3 provides both pointwise bounds (O(n^{-2β/(2β+d)}) for each fixed t) and integrated bounds over [0,T] that exploit the shared representation; the extra approximation error term arising from the multi-output head is shown to be of lower order than the separate-estimation penalty when the temporal smoothness holds, because the shared layers amortize the approximation cost across time points. We have added a new paragraph in Section 3.3 that states the precise Hölder exponent, reproduces the key steps of the derivation from Appendix B in the main text, and includes a side-by-side comparison of the joint versus separate estimation rates. These additions make the 'no extra approximation cost' claim fully verifiable from the main paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper constructs a doubly robust pseudo-outcome whose conditional expectation is shown to identify the time-specific CATE under standard assumptions (correct outcome or propensity model, with censoring accounted for). This is a standard identification strategy rather than a self-definition. Error bounds for pointwise and joint estimation are derived from approximation theory and smoothness conditions on the target functions, without reducing to fitted parameters renamed as predictions. Cross-fitting is used to mitigate overfitting in nuisance estimation, consistent with external doubly robust literature. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are invoked to force the central result. The derivation remains self-contained against external benchmarks for DR identification in censored data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard identifying assumptions for CATE under right censoring and the doubly robust property that holds when at least one nuisance model is correct. No new entities are postulated. Neural network parameters are implicit but not enumerated as free parameters in the abstract.

axioms (2)
  • domain assumption Standard assumptions for identifying conditional average treatment effects in the presence of right censoring and treatment assignment
    Invoked in the abstract as 'under standard assumptions' for the pseudo-outcome to identify time-specific CATEs.
  • domain assumption Smoothness of treatment effect trajectories over time
    Required for the claim that joint estimation controls error without much additional approximation cost.

pith-pipeline@v0.9.0 · 5546 in / 1454 out tokens · 57575 ms · 2026-05-10T16:35:50.859749+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 5 canonical work pages

  1. [1]

    Causal inference for statistics, social, and biomedical sciences: An introduction

    Guido W Imbens and Donald B Rubin. Causal inference for statistics, social, and biomedical sciences: An introduction. Taylor & Francis, 2016

  2. [2]

    Dharmarajan, Jennifer L

    Sai H. Dharmarajan, Jennifer L. Bragg-Gresham, Hal Morgenstern, Brenda W. Gillespie, Yi Li, Neil R. Powe, Delphine S. Tuot, Tanushree Banerjee, Nilka R \' os Burrows, Deborah B. Rolka, Sharon H. Saydah, and Rajiv Saran. State-level awareness of chronic kidney disease in the US . American Journal of Preventive Medicine, 53 0 (3): 0 300--307, 2017. doi:10.1...

  3. [3]

    HPV16 transmission between a couple with HPV -related head and neck cancer

    Robert Haddad, Christopher Crum, Zigui Chen, Jeffrey Krane, Marshall Posner, Yi Li, and Robert Burk. HPV16 transmission between a couple with HPV -related head and neck cancer. Oral Oncology, 44 0 (8): 0 812--815, 2008. doi:10.1016/j.oraloncology.2007.09.004

  4. [4]

    Spirometry at diagnosis and overall survival in non-small cell lung cancer patients

    Ting Zhai, Yi Li, Robert Brown, Michael Lanuti, Justin F Gainor, and David C Christiani. Spirometry at diagnosis and overall survival in non-small cell lung cancer patients. Cancer Medicine, 11 0 (24): 0 4796--4805, 2022

  5. [5]

    A generalization of sampling without replacement from a finite universe

    Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47 0 (260): 0 663--685, 1952

  6. [6]

    o ren R K \

    S \"o ren R K \"u nzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National academy of Sciences, 116 0 (10): 0 4156--4165, 2019

  7. [7]

    Quasi-oracle estimation of heterogeneous treatment effects

    Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108 0 (2): 0 299--319, 2021

  8. [8]

    Towards optimal doubly robust estimation of heterogeneous causal effects

    Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497, 2020

  9. [9]

    Doubly robust estimators for heterogeneous treatment effects in heteroskedastic survival data

    Yuhui Yang, Weiwei Hu, Zhenli Liao, and Fangyao Chen. Doubly robust estimators for heterogeneous treatment effects in heteroskedastic survival data. Statistics in Medicine, 44 0 (23-24): 0 e70301, 2025

  10. [10]

    Evaluating meta-learners to analyze treatment heterogeneity in survival data: Application to electronic health records of pediatric asthma care in covid-19 pandemic

    Na Bo, Jong-Hyeon Jeong, Erick Forno, and Ying Ding. Evaluating meta-learners to analyze treatment heterogeneity in survival data: Application to electronic health records of pediatric asthma care in covid-19 pandemic. Statistics in Medicine, 44 0 (3-4): 0 e10333, 2025

  11. [11]

    Estimating heterogeneous treatment effects with right-censored data via causal survival forests

    Yifan Cui, Michael R Kosorok, Erik Sverdrup, Stefan Wager, and Ruoqing Zhu. Estimating heterogeneous treatment effects with right-censored data via causal survival forests. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 0 (2): 0 179--211, 2023

  12. [12]

    Heterogeneous treatment effect estimation for observational data using model-based forests

    Susanne Dandl, Andreas Bender, and Torsten Hothorn. Heterogeneous treatment effect estimation for observational data using model-based forests. Statistical Methods in Medical Research, 33 0 (3): 0 392--413, 2024

  13. [13]

    A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection

    Liangyuan Hu. A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection. Biometrical Journal, 66 0 (1): 0 2200178, 2024

  14. [14]

    Survite: Learning heterogeneous treatment effects from time-to-event data

    Alicia Curth, Changhee Lee, and Mihaela van der Schaar. Survite: Learning heterogeneous treatment effects from time-to-event data. Advances in Neural Information Processing Systems, 34: 0 26740--26753, 2021

  15. [15]

    Treatment heterogeneity with survival outcomes

    Yizhe Xu, Nikolaos Ignatiadis, Erik Sverdrup, Scott Fleming, Stefan Wager, and Nigam Shah. Treatment heterogeneity with survival outcomes. In Handbook of matching and weighting adjustments for causal inference, pages 445--482. Chapman and Hall/CRC, 2023

  16. [16]

    The central role of the propensity score in observational studies for causal effects

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70 0 (1): 0 41--55, 1983

  17. [17]

    Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction

    Guido W Imbens and Donald B Rubin. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015

  18. [18]

    Cox's regression model for counting processes: A large sample study

    Per Kragh Andersen and Richard D Gill. Cox's regression model for counting processes: A large sample study. Annals of Statistics, 10 0 (4): 0 1100--1120, 1982

  19. [19]

    Semiparametric Theory and Missing Data

    Anastasios A Tsiatis. Semiparametric Theory and Missing Data. Springer, 2006

  20. [20]

    Asymptotic Statistics

    Aad W van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998

  21. [21]

    Double/debiased machine learning for treatment and structural parameters

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 0 (1): 0 C1--C68, 2018 a

  22. [22]

    doi: 10.1007/b13794

    Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, 2009. doi:10.1007/b13794

  23. [23]

    A Distribution-Free Theory of Nonparametric Regression

    L \'a szl \'o Gy \"o rfi, Michael Kohler, Adam Krzy \.z ak, and Harro Walk. A Distribution-Free Theory of Nonparametric Regression. Springer, 2006

  24. [24]

    Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21 0 (1): 0 C1--C68, 2018 b . doi:10.1111/ectj.12097

  25. [25]

    Deep relu network approximation of functions on a manifold

    Johannes Schmidt-Hieber. Deep relu network approximation of functions on a manifold. Journal of Machine Learning Research, 21 0 (52): 0 1--26, 2020

  26. [26]

    van der Vaart and Jon A

    Aad W. van der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes. Springer, 1996

  27. [27]

    Bartlett, Dylan J

    Peter L. Bartlett, Dylan J. Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, volume 30, 2017

  28. [28]

    A. A. R. Alvarez, Y. Sun, Y. Li, and D. C. Christiani. Effects of sex on mortality in patients with lung cancer: A multiple mediation analysis of the Boston Lung Cancer Study . Clinical Lung Cancer, 27 0 (2): 0 201--209.e3, March 2026

  29. [29]

    Prediagnosis smoking cessation and overall survival among patients with non--small cell lung cancer

    Xinan Wang, Christopher W Romero-Gutierrez, Jui Kothari, Andrea Shafer, Yi Li, and David C Christiani. Prediagnosis smoking cessation and overall survival among patients with non--small cell lung cancer. JAMA Network Open, 6 0 (5): 0 e2311966, 2023

  30. [30]

    Long-term results of the international adjuvant lung cancer trial evaluating adjuvant cisplatin-based chemotherapy in resected lung cancer

    Rodrigo Arriagada, Ariane Dunant, Jean-Pierre Pignon, Bengt Bergman, Mariusz Chabowski, Dominique Grunenwald, Miroslaw Kozlowski, C \'e cile Le P \'e choux, Robert Pirker, Maria-Izabel Sathler Pinel, et al. Long-term results of the international adjuvant lung cancer trial evaluating adjuvant cisplatin-based chemotherapy in resected lung cancer. Journal of...

  31. [31]

    The benefits and harms of adjuvant chemotherapy for non-small cell lung cancer in patients with major comorbidities: A simulation study

    Amanda Leiter, Chung Yin Kong, Michael K Gould, Minal S Kale, Rajwanth R Veluswamy, Cardinale B Smith, Grace Mhango, Brian Z Huang, Juan P Wisnivesky, and Keith Sigel. The benefits and harms of adjuvant chemotherapy for non-small cell lung cancer in patients with major comorbidities: A simulation study. Plos one, 17 0 (11): 0 e0263911, 2022

  32. [32]

    Gender, age, and comorbidity status predict improved survival with adjuvant chemotherapy following lobectomy for non-small cell lung cancers larger than 4 cm

    Britt J Sandler, Zuoheng Wang, Jacquelyn G Hancock, Daniel J Boffa, Frank C Detterbeck, and Anthony W Kim. Gender, age, and comorbidity status predict improved survival with adjuvant chemotherapy following lobectomy for non-small cell lung cancers larger than 4 cm. Annals of surgical oncology, 23: 0 638--645, 2016

  33. [33]

    Jean-Yves Douillard, Rafael Rosell, Mario De Lena, Francesco Carpagnano, Rodryg Ramlau, Jose Luis Gonz \'a les-Larriba, Tomasz Grodzki, Jose Rodrigues Pereira, Alain Le Groumellec, Vito Lorusso, et al. Adjuvant vinorelbine plus cisplatin versus observation in patients with completely resected stage ib--iiia non-small-cell lung cancer (adjuvant navelbine i...

  34. [34]

    Influence of smoking on histologic type and the efficacy of adjuvant chemotherapy in resected non-small cell lung cancer

    Zhenfa Zhang, Feng Xu, Shengguang Wang, Ni Li, and Changli Wang. Influence of smoking on histologic type and the efficacy of adjuvant chemotherapy in resected non-small cell lung cancer. Lung Cancer, 60 0 (3): 0 434--440, 2008

  35. [35]

    The effect of body mass index on treatment outcomes in patients with metastatic non-small cell lung cancer treated with platinum-based therapy

    Aysegul Sakin, Suleyman Sahin, Muhammed Mustafa Atci, Nurgul Yasar, Cumhur Demir, Caglayan Geredeli, Abdullah Sakin, and Sener Cihan. The effect of body mass index on treatment outcomes in patients with metastatic non-small cell lung cancer treated with platinum-based therapy. Nutrition and Cancer, 73 0 (8): 0 1411--1418, 2021