arxiv: 2605.14275 · v1 · submitted 2026-05-14 · 🧮 math.ST · stat.TH

Recognition: no theorem link

Double/debiased machine learning of quantile treatment effects on long-term outcomes in clinical trials

Ziyang Liu , Niwen Zhou , Peng Wu , Xu Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:35 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords quantile treatment effectsdoubly robust estimationmachine learningclinical trialssurrogate outcomestransportabilitydata integrationcausal inference

0 comments

The pith

A doubly robust estimator identifies quantile treatment effects on long-term outcomes by linking trial surrogates to external data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to estimate quantile treatment effects on long-term outcomes when those outcomes are unavailable in randomized trials but present in external observational data. It integrates short-term surrogate outcomes from the trial with long-term data from outside sources under randomization, positivity, and a surrogate transportability assumption. The resulting estimator uses machine learning for nuisance functions and stays consistent if either the score-related or outcome regression nuisances are correctly estimated. This supports valid inference on heterogeneous effects across quantiles of the long-term outcome distribution.

Core claim

Under treatment randomization, positivity, and a surrogate-based transportability assumption, we establish identification and develop a doubly robust estimator for inference. The estimator accommodates flexible machine learning methods for nuisance estimation, remains consistent if either the score-related or outcome regression-related nuisance functions are consistently estimated, and is asymptotically normal under regularity conditions.

What carries the argument

Doubly robust score-based estimator for quantile treatment effects that integrates randomized trial data with external observational data via surrogate outcomes under a transportability assumption.

If this is right

The estimator supports arbitrary machine learning methods for nuisance estimation without sacrificing consistency under double robustness.
Asymptotic normality permits construction of confidence intervals and tests for effects at specific quantiles.
The approach applies to real clinical data to uncover treatment effect heterogeneity across the outcome distribution.
Finite-sample behavior is reliable in simulations, enabling practical use beyond average treatment effects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar double-robust integration could apply to other long-term estimands like survival probabilities if transportability is plausible.
Sensitivity analyses for the transportability assumption could be added to assess robustness in applied settings.
The framework suggests collecting targeted short-term surrogates in trials to enable long-term inference when external data exist.
Efficiency gains might increase by incorporating additional covariates or multiple external sources under extended assumptions.

Load-bearing premise

The surrogate-based transportability assumption that permits linking short-term surrogates observed in the randomized trial to long-term outcomes in the external observational data.

What would settle it

A simulation study or empirical check where the transportability assumption is violated would reveal increased bias or failed coverage in the estimator, while performance remains stable when the assumption holds.

Figures

Figures reproduced from arXiv: 2605.14275 by Niwen Zhou, Peng Wu, Xu Guo, Ziyang Liu.

read the original abstract

Long-term outcomes are often unavailable in randomized clinical trials, although short-term surrogate outcomes are commonly observed. External observational data may contain the long-term outcome, but causal comparisons based on such data alone are vulnerable to confounding. Existing surrogate-based data integration methods for long-term outcomes have focused primarily on average treatment effects. We study estimation of quantile treatment effects for long-term outcomes in the trial population by combining randomized trial data with external observational data. Under treatment randomization, positivity, and a surrogate-based transportability assumption, we establish identification and develop a doubly robust estimator for inference. The estimator accommodates flexible machine learning methods for nuisance estimation, remains consistent if either the score-related or outcome regression-related nuisance functions are consistently estimated, and is asymptotically normal under regularity conditions. Simulation and real-data results demonstrate that the proposed method performs well in finite samples and can reveal heterogeneous long-term treatment effects across quantiles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a doubly robust estimator for quantile long-term treatment effects by fusing RCT surrogate data with external observational long-term data, but the whole thing rests on an untested transportability assumption.

read the letter

The new piece is the extension of double/debiased ML to quantile treatment effects rather than averages, using trial short-term surrogates plus observational long-term outcomes. They lay out identification under randomization, positivity, and the surrogate transportability condition, then build an estimator that stays consistent if either the score or the outcome regression nuisance is correct and is asymptotically normal under standard conditions. Simulations and a real-data example are included to show finite-sample behavior and heterogeneous effects across quantiles.

Referee Report

3 major / 2 minor

Summary. The paper claims that, under treatment randomization, positivity, and a surrogate-based transportability assumption linking short-term surrogates in randomized trial data to long-term outcomes in external observational data, the quantile treatment effect on the long-term outcome is identified in the trial population. It develops a doubly robust estimator that accommodates machine learning for nuisance functions, remains consistent if either the score-related or outcome-regression nuisance is correctly specified, and is asymptotically normal under regularity conditions. Simulation and real-data examples are said to support finite-sample performance and reveal heterogeneous long-term effects across quantiles.

Significance. If the identification and double-robustness results hold, the work would be significant for clinical trials research: long-term outcomes are frequently unavailable within trials, yet external data often contain them; extending double/debiased ML to quantile (rather than mean) effects allows detection of heterogeneous treatment impacts that averages can mask, while the double-robustness property provides protection when flexible ML methods are used for high-dimensional nuisance estimation.

major comments (3)

[Section 2] Section 2 (Identification): The surrogate-based transportability assumption is presented as sufficient to identify the long-term quantile treatment effect, but the manuscript provides neither the explicit derivation mapping the conditional quantile function across data sources nor any sensitivity analysis for violations of this assumption conditional on observed covariates; this step is load-bearing for the entire identification claim.
[Section 3] Section 3 (Estimator): The abstract asserts double robustness and asymptotic normality, yet the explicit form of the doubly robust score function, the influence function, and the precise estimating equation are not displayed; without these, the claimed double-robustness property (consistency if either nuisance is correct) cannot be verified from the text.
[Simulation section] Simulation section: The design details—data-generating processes, how the transportability assumption is enforced or relaxed, and the specific nuisance estimators used—are absent, so the reported finite-sample performance cannot be assessed or reproduced.

minor comments (2)

Notation for the quantile functions and the distinction between trial and external populations could be introduced earlier and used consistently to improve readability.
The real-data application would benefit from a clearer statement of which covariates are used for transportability and any diagnostics for the positivity assumption.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments identify areas where additional explicit derivations, mathematical forms, and simulation details will improve clarity and verifiability. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Section 2] Section 2 (Identification): The surrogate-based transportability assumption is presented as sufficient to identify the long-term quantile treatment effect, but the manuscript provides neither the explicit derivation mapping the conditional quantile function across data sources nor any sensitivity analysis for violations of this assumption conditional on observed covariates; this step is load-bearing for the entire identification claim.

Authors: We agree that an expanded derivation and sensitivity analysis will strengthen the identification section. In the revision we will insert a step-by-step derivation that explicitly maps the conditional quantile function from the observational data source to the trial population under the surrogate transportability assumption (conditional on observed covariates). We will also add a dedicated sensitivity analysis subsection that examines the consequences of violations of the transportability assumption. revision: yes
Referee: [Section 3] Section 3 (Estimator): The abstract asserts double robustness and asymptotic normality, yet the explicit form of the doubly robust score function, the influence function, and the precise estimating equation are not displayed; without these, the claimed double-robustness property (consistency if either nuisance is correct) cannot be verified from the text.

Authors: We acknowledge that the explicit score, influence function, and estimating equation were not displayed in sufficient detail. In the revised Section 3 we will present the doubly robust score function, derive the influence function, and state the precise estimating equation. These additions will allow direct verification that the estimator remains consistent when either the score-related nuisance or the outcome-regression nuisance is correctly specified. revision: yes
Referee: [Simulation section] Simulation section: The design details—data-generating processes, how the transportability assumption is enforced or relaxed, and the specific nuisance estimators used—are absent, so the reported finite-sample performance cannot be assessed or reproduced.

Authors: We agree that the simulation design requires fuller documentation for reproducibility. In the revised manuscript we will supply the complete data-generating processes, describe how the transportability assumption is enforced in the primary simulations and relaxed in robustness checks, and specify the machine-learning methods and tuning procedures used for each nuisance function. revision: yes

Circularity Check

0 steps flagged

No circularity; identification and estimator follow from stated assumptions plus standard doubly-robust construction

full rationale

The derivation begins from three explicit maintained assumptions (randomization, positivity, surrogate transportability) that are external to the paper's own fitted quantities. Identification of the long-term quantile treatment effect is obtained directly from these assumptions by standard g-computation or inverse-probability weighting arguments; the doubly-robust estimator is then assembled from the resulting efficient influence function using off-the-shelf machine-learning nuisance estimators. No equation in the abstract or described chain equates a target parameter to a fitted nuisance function by definition, renames a known result, or invokes a self-citation whose content is itself unverified. The transportability assumption is an input, not an output, so the estimator's consistency claim remains falsifiable against external data and does not collapse to a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on three standard causal assumptions plus the key transportability condition; no free parameters or new entities are introduced in the abstract.

axioms (3)

domain assumption treatment randomization
Invoked to identify causal effects from the trial data.
domain assumption positivity
Required for well-defined causal contrasts in both data sources.
domain assumption surrogate-based transportability assumption
The load-bearing link between short-term trial surrogates and long-term observational outcomes.

pith-pipeline@v0.9.0 · 5453 in / 1425 out tokens · 35796 ms · 2026-05-15T02:35:42.644855+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

2018 , publisher=

Double/debiased machine learning for treatment and structural parameters , author=. 2018 , publisher=

work page 2018
[2]

Gaussian approximation of suprema of empirical processes , author=

work page
[3]

Handbook of econometrics , volume=

Large sample estimation and hypothesis testing , author=. Handbook of econometrics , volume=. 1994 , publisher=

work page 1994
[4]

1996 , publisher=

Weak convergence , author=. 1996 , publisher=

work page 1996
[5]

Biometrika , volume=

Inverting estimating equations for causal inference on quantiles , author=. Biometrika , volume=. 2025 , publisher=

work page 2025
[6]

Transactions on Machine Learning Research , issn=

Doubly Robust Uncertainty Quantification for Quantile Treatment Effects in Sequential Decision Making , author=. Transactions on Machine Learning Research , issn=. 2025 , url=

work page 2025
[7]

Review of Economic Studies , pages=

The surrogate index: Combining short-term proxies to estimate long-term treatment effects more rapidly and precisely , author=. Review of Economic Studies , pages=. 2025 , publisher=

work page 2025
[8]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Long-term causal inference under persistent confounding via data combination , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025
[9]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

On the role of surrogates in the efficient estimation of treatment effects with limited outcome data , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025
[10]

2025 , eprint=

Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects , author=. 2025 , eprint=

work page 2025
[11]

Journal of Econometrics , volume=

Semiparametric estimation of long-term treatment effects , author=. Journal of Econometrics , volume=. 2023 , publisher=

work page 2023
[12]

Journal of Machine Learning Research , volume=

Localized debiased machine learning: Efficient inference on quantile treatment effects and beyond , author=. Journal of Machine Learning Research , volume=

work page
[13]

Causal Inference: What If , year =

Hern. Causal Inference: What If , year =

work page
[14]

Biometrika , volume=

The central role of the propensity score in observational studies for causal effects , author=. Biometrika , volume=. 1983 , publisher=

work page 1983
[15]

Econometrica , volume=

Efficient semiparametric estimation of quantile treatment effects , author=. Econometrica , volume=. 2007 , publisher=

work page 2007
[16]

Journal of business & economic statistics , volume=

Unconditional quantile treatment effects under endogeneity , author=. Journal of business & economic statistics , volume=. 2013 , publisher=

work page 2013
[17]

Biometrics , volume=

Causal inference on quantiles with an obstetric application , author=. Biometrics , volume=. 2012 , publisher=

work page 2012
[18]

Journal of Statistical Planning and Inference , volume=

Efficient estimation of quantiles in missing data models , author=. Journal of Statistical Planning and Inference , volume=. 2017 , publisher=

work page 2017
[19]

Biometrics , volume=

Doubly robust estimation and sensitivity analysis for marginal structural quantile models , author=. Biometrics , volume=. 2024 , publisher=

work page 2024
[20]

The American Statistician , volume=

Demystifying statistical learning based on efficient influence functions , author=. The American Statistician , volume=. 2022 , publisher=

work page 2022
[21]

Biometrics , volume=

Estimation of the optimal surrogate based on a randomized trial , author=. Biometrics , volume=. 2018 , publisher=

work page 2018
[22]

Biostatistics , volume=

Doubly robust evaluation of high-dimensional surrogate markers , author=. Biostatistics , volume=. 2023 , publisher=

work page 2023
[23]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Robust evaluation of longitudinal surrogate markers with censored data , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025
[24]

Statistica Sinica , volume=

IDENTIFICATION AND ESTIMATION OF TREATMENT EFFECTS ON LONG-TERM OUTCOMES IN CLINICAL TRIALS WITH EXTERNAL OBSERVATIONAL DATA , author=. Statistica Sinica , volume=

work page
[25]

Journal of the American Statistical Association , volume=

Long-term effect estimation when combining clinical trial and observational follow-up datasets , author=. Journal of the American Statistical Association , volume=. 2026 , publisher=

work page 2026
[26]

American journal of kidney diseases , volume=

Histologic subclassification of IgA nephropathy: a clinicopathologic study of 244 cases , author=. American journal of kidney diseases , volume=. 1997 , publisher=

work page 1997
[27]

American Journal of Kidney Diseases , volume=

Effects of hydroxychloroquine on proteinuria in IgA nephropathy: a randomized controlled trial , author=. American Journal of Kidney Diseases , volume=. 2019 , publisher=

work page 2019
[28]

BioMed Research International , volume=

Effects of Hydroxychloroquine on Proteinuria in IgA Nephropathy: A Systematic Review and Meta-Analysis , author=. BioMed Research International , volume=. 2021 , publisher=

work page 2021
[29]

American journal of nephrology , volume=

Effects of hydroxychloroquine on proteinuria in immunoglobulin A nephropathy , author=. American journal of nephrology , volume=. 2018 , publisher=

work page 2018
[30]

Clinical Journal of the American Society of Nephrology , volume=

Pathologic predictors of renal outcome and therapeutic efficacy in IgA nephropathy: validation of the oxford classification , author=. Clinical Journal of the American Society of Nephrology , volume=. 2011 , publisher=

work page 2011
[31]

, author=

Estimating causal effects of treatments in randomized and nonrandomized studies. , author=. Journal of educational Psychology , volume=. 1974 , publisher=

work page 1974
[32]

Essay on principles

On the application of probability theory to agricultural experiments. Essay on principles. Section 9 , author=. Statistical Science , pages=. 1990 , publisher=

work page 1990
[33]

The Annals of Statistics , volume=

Generalized random forests , author=. The Annals of Statistics , volume=

work page