arxiv: 2605.05051 · v1 · submitted 2026-05-06 · 📊 stat.ME

Recognition: unknown

Impossibility of Distribution-Free Predictive Inference for Individual Treatment Effects

Chongguang Tao , Zheng Zhou , Yuhong Yang

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:24 UTC · model grok-4.3

classification 📊 stat.ME

keywords individual treatment effectsdistribution-free inferenceconformal predictioncausal inferenceimpossibility resultsconditional independenceprediction sets

0 comments

The pith

Distribution-free prediction sets for individual treatment effects must be trivial with infinite expected length when covariates are continuous.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that under standard causal assumptions of strong ignorability and overlap, no non-trivial distribution-free prediction set can achieve valid coverage for individual treatment effects when covariates are continuous. Both finite-sample and asymptotic impossibility results are shown, with the argument relying on a reduction of valid ITE coverage to the hardness of conditional independence testing. A sympathetic reader cares because this exposes why conformal-style methods cannot deliver finite-length sets in this setting without further structure, clarifying the limits imposed by the missing counterfactuals in causal data.

Core claim

Any distribution-free prediction set achieving desired coverage for individual treatment effects must have infinite expected length in the presence of continuous covariates. This follows from finite-sample and asymptotic impossibility results that connect ITE prediction-set validity to the hardness of conditional independence testing between treatment and potential outcomes given the covariates, under the missing-data structure of causal inference.

What carries the argument

The reduction of ITE prediction-set validity to the hardness of conditional independence testing under continuous covariates.

Load-bearing premise

That any valid distribution-free ITE prediction set reduces to performing a conditional independence test that cannot succeed with continuous covariates.

What would settle it

A construction or empirical instance of a finite-length distribution-free prediction set for ITEs that achieves the target coverage under continuous covariates and standard causal assumptions would falsify the result.

Figures

Figures reproduced from arXiv: 2605.05051 by Chongguang Tao, Yuhong Yang, Zheng Zhou.

**Figure 1.** Figure 1: CDFs of the DR pseudo score and the true score in a simple setting with known regression view at source ↗

**Figure 2.** Figure 2: Top: realized coverage versus nominal target coverage under the beta propensity score view at source ↗

**Figure 3.** Figure 3: Empirical coverage of realized ITE intervals under the beta propensity score setting with view at source ↗

**Figure 4.** Figure 4: Realized coverage versus nominal target coverage when view at source ↗

**Figure 5.** Figure 5: Realized coverage versus nominal target coverage in the checkerboard stress-test design. view at source ↗

**Figure 6.** Figure 6: Visualization of the checkerboard stress-test design. Left: cells in which the control view at source ↗

read the original abstract

Uncertainty quantification for individual treatment effects (ITEs) is a daunting challenge in causal inference. Motivated by recent advances in conformal prediction, several works aim to construct distribution-free prediction sets for ITEs with desired coverage under standard assumptions such as strong ignorability and overlap. In this paper, we show that such goals are fundamentally unattainable in the presence of continuous covariates. Specifically, we establish finite-sample and asymptotic impossibility results demonstrating that any distribution-free prediction set achieving desired coverage for ITEs must be trivial, in the sense that it has infinite expected length. Our analysis relies on a connection between ITE inference and the hardness of conditional independence testing, and highlights the intrinsic limitations imposed by the missing data nature of causal inference. These results provide a new perspective on existing methods, clarifying that their apparent success necessarily relies on additional structural assumptions beyond standard causal assumptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that distribution-free prediction sets for individual treatment effects must have infinite expected length under continuous covariates and standard causal assumptions.

read the letter

The main point is that any distribution-free prediction set for individual treatment effects with valid coverage has to be trivial, meaning infinite expected length, when covariates are continuous. This holds in both finite samples and asymptotically under strong ignorability and overlap. The argument works by reducing the coverage requirement to the known hardness of nonparametric conditional independence testing, which captures the missing-data structure since only one potential outcome is seen per unit.

Referee Report

0 major / 3 minor

Summary. The manuscript proves finite-sample and asymptotic impossibility results for distribution-free prediction sets for individual treatment effects (ITEs). Under strong ignorability, overlap, and continuous covariates, any prediction set achieving valid coverage must be trivial (infinite expected length). The argument reduces ITE coverage validity to the known hardness of nonparametric conditional independence testing, exploiting the missing-data structure where only one potential outcome is observed per unit.

Significance. If the reduction holds, the result is significant for causal inference and conformal prediction. It rigorously demonstrates intrinsic limitations of distribution-free ITE methods without further assumptions, explaining why recent conformal approaches for causal effects require implicit structure. The connection to conditional independence testing is a clear strength, providing a clean proof strategy that leverages established statistical impossibilities and the fundamental missing-data problem in causal settings. This offers a valuable theoretical clarification for the field.

minor comments (3)

[Abstract and §1] The abstract and introduction could more explicitly distinguish marginal coverage from conditional coverage when defining 'valid coverage' for the random ITE, as this distinction is load-bearing for the reduction.
[§5] In the discussion of implications, add a brief paragraph on how the result interacts with common practical relaxations such as smoothness or parametric assumptions on the outcome models.
[§2] Notation for the prediction set C_n(x, t) and the ITE random variable should be introduced with a dedicated notation subsection or table to improve readability for readers unfamiliar with the conformal literature.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary and significance assessment of our impossibility results for distribution-free ITE prediction sets. We appreciate the recognition of the connection to conditional independence testing and the missing-data structure in causal inference. The referee recommends minor revision, but no specific major comments were raised.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper establishes an impossibility result for distribution-free ITE prediction sets by reducing the coverage requirement to the known statistical hardness of finite-sample or consistent nonparametric conditional independence testing, under standard assumptions (strong ignorability, overlap, continuous covariates). This reduction exploits the missing-data structure of potential outcomes and does not rely on any fitted parameters, self-definitional equations, or load-bearing self-citations. The central claim is an external reduction to a pre-existing result in nonparametric statistics rather than a renaming, ansatz, or construction that collapses to the paper's own inputs. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard causal assumptions of strong ignorability and overlap together with the mathematical fact that conditional independence testing is hard for continuous covariates; no free parameters or new entities are introduced.

axioms (2)

domain assumption Strong ignorability and overlap hold.
Explicitly invoked as the setting under which impossibility is shown.
domain assumption Covariates are continuous.
Required for the impossibility to hold; stated as the key condition.

pith-pipeline@v0.9.0 · 5445 in / 1229 out tokens · 24228 ms · 2026-05-08T16:24:46.121991+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 2 canonical work pages

[1]

Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects.Bmj, 363, 2018

David M Kent, Ewout Steyerberg, and David Van Klaveren. Personalized evidence based medicine: predictive approaches to heterogeneous treatment effects.Bmj, 363, 2018

2018
[2]

Who should be treated? empirical welfare maximization methods for treatment choice.Econometrica, 86(2):591–616, 2018

Toru Kitagawa and Aleksey Tetenov. Who should be treated? empirical welfare maximization methods for treatment choice.Econometrica, 86(2):591–616, 2018

2018
[3]

Policy learning with observational data.Econometrica, 89(1): 133–161, 2021

Susan Athey and Stefan Wager. Policy learning with observational data.Econometrica, 89(1): 133–161, 2021

2021
[4]

Bayesian nonparametric modeling for causal inference.Journal of Computa- tional and Graphical Statistics, 20(1):217–240, 2011

Jennifer L Hill. Bayesian nonparametric modeling for causal inference.Journal of Computa- tional and Graphical Statistics, 20(1):217–240, 2011

2011
[5]

Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

2018
[6]

Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the National Academy of Sciences, 116(10):4156–4165, 2019

Sören R Künzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning.Proceedings of the National Academy of Sciences, 116(10):4156–4165, 2019

2019
[7]

Quasi-oracle estimation of heterogeneous treatment effects

Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021

2021
[8]

Towards optimal doubly robust estimation of heterogeneous causal effects

Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008–3049, 2023

2023
[9]

A tutorial on conformal prediction.Journal of Machine Learning Research, 9(Mar):371–421, 2008

Glenn Shafer and Vladimir V ovk. A tutorial on conformal prediction.Journal of Machine Learning Research, 9(Mar):371–421, 2008

2008
[10]

Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. Distribution-free predictive inference for regression.Journal of the American Statistical Associ- ation, 113(523):1094–1111, 2018

2018
[11]

and Bates, Stephen , title =

Anastasios N. Angelopoulos and Stephen Bates. Conformal prediction: A gentle introduction. Found. Trends Mach. Learn., 16(4):494–591, March 2023. ISSN 1935-8237. doi: 10.1561/ 2200000101. URLhttps://doi.org/10.1561/2200000101

work page doi:10.1561/2200000101 2023
[12]

arXiv preprint arXiv:2411.11824 , year=

Anastasios N. Angelopoulos, Rina Foygel Barber, and Stephen Bates. Theoretical foundations of conformal prediction, 2026. URLhttps://arxiv.org/abs/2411.11824

work page arXiv 2026
[13]

Conformal inference of counterfactuals and individual treatment effects.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83 (5):911–938, 2021

Lihua Lei and Emmanuel J Candès. Conformal inference of counterfactuals and individual treatment effects.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83 (5):911–938, 2021

2021
[14]

Sensitivity analysis of individual treatment effects: A robust conformal inference approach.Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023

Ying Jin, Zhimei Ren, and Emmanuel J Candès. Sensitivity analysis of individual treatment effects: A robust conformal inference approach.Proceedings of the National Academy of Sciences, 120(6):e2214889120, 2023

2023
[15]

Conformal meta-learners for predictive inference of individual treatment effects.Advances in neural information processing systems, 36:47682–47703, 2023

Ahmed M Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects.Advances in neural information processing systems, 36:47682–47703, 2023

2023
[16]

Conformal inference of individual treatment effects using conditional density estimates

Baozhen Wang and Xingye Qiao. Conformal inference of individual treatment effects using conditional density estimates. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 21009–21016, 2025

2025
[17]

The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

1983
[18]

Shah and Jonas Peters

Rajen D. Shah and Jonas Peters. The hardness of conditional independence testing and the generalised covariance measure.The Annals of Statistics, 48(3):1514 – 1538, 2020. 10

2020
[19]

Doubly robust calibra- tion of prediction sets under covariate shift.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(4):943–965, 2024

Yachong Yang, Arun Kumar Kuchibhotla, and Eric Tchetgen Tchetgen. Doubly robust calibra- tion of prediction sets under covariate shift.Journal of the Royal Statistical Society Series B: Statistical Methodology, 86(4):943–965, 2024

2024
[20]

Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

Paul W Holland. Statistics and causal inference.Journal of the American Statistical Association, 81(396):945–960, 1986

1986
[21]

Optimal global rates of convergence for nonparametric regression.The annals of statistics, pages 1040–1053, 1982

Charles J Stone. Optimal global rates of convergence for nonparametric regression.The annals of statistics, pages 1040–1053, 1982

1982
[22]

Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions.Journal of Econometrics, 188(2):447–465, 2015

Xiaohong Chen and Timothy M Christensen. Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions.Journal of Econometrics, 188(2):447–465, 2015

2015
[23]

Overlap in observational studies with high-dimensional covariates.Journal of Econometrics, 221(2): 644–654, 2021

Alexander D’Amour, Peng Ding, Avi Feller, Lihua Lei, and Jasjeet Sekhon. Overlap in observational studies with high-dimensional covariates.Journal of Econometrics, 221(2): 644–654, 2021

2021
[24]

Toward a curse of dimensionality appropriate (coda) asymptotic theory for semi-parametric models.Statistics in medicine, 16(3):285–319, 1997

James M Robins and Ya’acov Ritov. Toward a curse of dimensionality appropriate (coda) asymptotic theory for semi-parametric models.Statistics in medicine, 16(3):285–319, 1997. 11 A Proof of Main Theorems We provide the proofs of the main theorems in Section 2 in the Appendix. The Appendix is organized as follows: A.1. we state technical lemmas on the hard...

1997
[25]

The marginal of(X, T)is preserved:Q X,T =P X,T
[26]

nest”,exact = FALSE, and CQR. • Nest-Exact: the exact nested conformal ITE method implemented by cfcausal::conformalItewithalgo = “nest

The conditional distribution of(Y(1), Y(0))given(X, T)is QY(1),Y(0)|X,T=t = ( L Y(0) +y, Y(0) , Y(0)∼P Y(0)|X,T=0 , t= 0, L Y(1), Y(1)−y , Y(1)∼P Y(1)|X,T=1 , t= 1, whereL(·)denotes the law of a random vector. Under this construction, Q(X,T,Y) =P (X,T,Y) , and, moreover, QY(1)|X,T=1 =P Y(1)|X,T=1 , Q Y(0)|X,T=0 =P Y(0)|X,T=0 , and QY(1)−Y(0)|X,T=1 =Q Y(1)...