arxiv: 2605.09116 · v1 · submitted 2026-05-09 · 📊 stat.ME · stat.AP· stat.ML

Recognition: 2 theorem links

· Lean Theorem

Fit CATE Once: Model-Assisted Randomization Tests Without Sample Splitting

Fangnan Zheng, Yao Zhang

Pith reviewed 2026-05-12 02:10 UTC · model grok-4.3

classification 📊 stat.ME stat.APstat.ML

keywords randomization testsconditional average treatment effectmodel-assisted inferenceheterogeneous treatment effectspanel experimentscovariance estimationsample splittingtreatment effect heterogeneity

0 comments

The pith

Estimating an unsigned CATE from residual covariances allows flexible model-assisted randomization tests without splitting samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that researchers can incorporate flexible estimates of treatment effect heterogeneity into randomization tests for panel experiments without dividing the data. It does so by deriving only the magnitude of the conditional average treatment effect from the covariance structure of outcomes after removing covariate and assignment effects, then choosing the sign afterward to match the observed responses. This preserves the exact finite-sample validity of the randomization inference while capturing complex patterns that simple adjustments miss. A reader would care because it removes the power loss from sample splitting and yields tests that control error rates yet detect heterogeneous effects more reliably than either pure randomization or fully model-based approaches.

Core claim

An unsigned version of the conditional average treatment effect is identifiable and can be consistently estimated from the covariance matrix of residualized outcomes under the known assignment mechanism. Using this unsigned estimate to construct the test statistic, with the sign selected post hoc to best fit the realized outcomes, produces a valid randomization test that controls Type I error and attains higher power than covariate-adjusted or sample-split alternatives in both synthetic and semi-synthetic settings. The same estimates can further be used to discover subgroups with heterogeneous effects and to test subgroup-specific treatment impacts.

What carries the argument

The unsigned conditional average treatment effect estimator derived from the covariance structure of residualized outcomes, which supplies the magnitude of heterogeneity while deferring sign choice to the observed data.

If this is right

The assisted tests control Type I error at the nominal level under the known assignment mechanism.
They achieve higher power than both covariate-adjusted randomization tests and tests that require sample splitting.
Assignment-free CATE estimates obtained this way can identify subgroups that experience heterogeneous treatment effects.
The same estimates support valid tests of treatment effects within discovered subgroups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of magnitude estimation from the randomization distribution may let researchers plug in a wider range of machine-learning predictors for the unsigned CATE without losing exact validity.
The approach could be applied to other experimental designs whose assignment probabilities are known but not uniform, provided the residual-covariance identification step still holds.
By avoiding sample splitting, the method may reduce the risk that important heterogeneity patterns are missed simply because too few observations remain in each fold.

Load-bearing premise

The unsigned CATE remains identifiable from residual covariances even when the sign is chosen after seeing the outcomes, without that choice introducing selection bias into the randomization distribution of the test statistic.

What would settle it

A Monte Carlo experiment in which data are generated under a known randomization scheme with no true treatment effect, yet the proportion of rejections by the CATE-assisted test exceeds the nominal significance level.

Figures

Figures reproduced from arXiv: 2605.09116 by Fangnan Zheng, Yao Zhang.

**Figure 2.** Figure 2: Comparison of randomization tests under the lagged-effect assumption. The [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗

**Figure 3.** Figure 3: Normalized MSE (NMSE) of the unsigned and signed CATE estimators. [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Semi-synthetic testing results on the county teen employment panel. [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗

**Figure 5.** Figure 5: Estimated CATEs and thresholds for three signal levels, [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Estimated thresholds and rejection rates for [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Consistency experiment for the parametric off-diagonal estimator. [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

read the original abstract

Randomization tests and flexible treatment-effect models offer complementary strengths for analyzing data from randomized panel experiments: the former provide valid inference under the known assignment mechanism, while the latter can capture complex patterns of effect heterogeneity. We develop model-assisted randomization tests that combine these strengths without sample splitting. The key idea is to estimate an unsigned version of the conditional average treatment effect (CATE) from the covariance structure of residualized outcomes, while leaving the realized assignments for randomization inference. The remaining sign can be chosen to best fit the observed outcomes. We establish identification and consistency for the proposed unsigned CATE estimators, as well as validity for the CATE-assisted randomization tests. Across synthetic and semi-synthetic experiments, the CATE-assisted randomization tests control Type I error and achieve higher power than covariate-adjusted and sample-split alternatives. Finally, we show that the assignment-free CATE estimates can be used to discover heterogeneous subgroups and test subgroup-specific treatment effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a clean way to assist randomization tests with CATE modeling without sample splitting by estimating an unsigned effect from residual covariances and picking the sign afterward, though the sign choice is the part that needs the most scrutiny for exact validity.

read the letter

This paper shows how to get power gains from modeling treatment effect heterogeneity in randomization tests without having to split the sample. They estimate an unsigned version of the CATE from the covariance structure of residualized outcomes, choose the sign to best fit the observed data, and then run the randomization test on the realized assignments as usual. The abstract states they prove identification, consistency, and validity for the whole procedure, and the simulations back up Type I error control plus power improvements over split-sample and covariate-adjusted baselines. They also note the estimates can help discover subgroups for further testing. What is actually new here is the specific unsigned estimator built from residual covariances combined with the post-hoc sign step while keeping the assignment mechanism untouched for inference. That construction avoids the usual sample-split penalty and still claims exact finite-sample guarantees. The simulations look reasonable for showing practical gains in both synthetic and semi-synthetic settings. The soft spot is the post-hoc sign selection. Under the null the covariance term is zero, but selecting the sign on the same residuals that feed the test statistic could create dependence between the chosen sign and the assignment vector. The randomization distribution obtained by reshuffling assignments might then no longer exactly match the null distribution of the statistic. The paper asserts validity is established, so presumably the proof shows the selection does not alter the conditional distribution given covariates, but that argument is load-bearing and worth checking line by line. If it holds, the method is useful; if it only holds approximately, the finite-sample claim weakens. This work is aimed at applied researchers running randomized experiments who want to test for heterogeneity without losing power to splitting. Readers who value exact randomization inference alongside flexible modeling will get the most out of it. It deserves a serious referee because the idea is direct, the simulations are supportive, and the validity question is concrete enough to be resolved in review.

Referee Report

2 major / 2 minor

Summary. The paper proposes model-assisted randomization tests for analyzing conditional average treatment effects (CATE) in randomized panel experiments without requiring sample splitting. The core method estimates an unsigned CATE from the covariance structure of residualized outcomes (leaving realized treatment assignments untouched for inference), selects the sign post hoc to best fit the observed data, establishes identification and consistency for the unsigned estimators along with validity of the assisted tests, demonstrates Type I error control and power improvements in simulations relative to covariate-adjusted or sample-split baselines, and applies the assignment-free estimates to heterogeneous subgroup discovery and testing.

Significance. If the validity of the randomization tests is preserved under the post-hoc sign selection, the contribution would be notable for enabling flexible, model-based capture of treatment effect heterogeneity while retaining the exact finite-sample guarantees of randomization inference. This avoids the data inefficiency of sample splitting and could improve power in settings with limited sample sizes, as suggested by the reported simulation gains. The additional use of the estimates for subgroup analysis extends the practical utility beyond testing.

major comments (2)

[theoretical development of validity (likely §3 or §4)] The validity of the CATE-assisted randomization tests (as claimed in the abstract and developed in the theoretical sections) rests on showing that post-hoc sign selection for the unsigned CATE estimator does not distort the reference distribution obtained by re-randomizing assignments. Under the null of no effect heterogeneity the covariance term is zero, yet the sign is chosen using the same residualized outcomes that enter the test statistic; this creates a potential data-dependent dependence that could invalidate exact finite-sample p-value calibration. The manuscript must provide an explicit argument or lemma establishing that the conditional distribution of the statistic given covariates remains correctly calibrated after selection, as this is load-bearing for the central no-sample-splitting claim.
[identification and consistency results] The identification result for the unsigned CATE estimator from residual covariances (abstract and identification section) needs to be stated with the precise assumptions on the residualization step and the assignment mechanism; if these assumptions are weaker than standard randomization assumptions, the consistency claim for the estimator should be cross-referenced to the simulation designs to confirm they are satisfied in the reported experiments.

minor comments (2)

[simulation studies] The simulation section would benefit from reporting the exact form of the unsigned CATE estimator (e.g., the covariance formula) alongside the power and Type I error tables for direct reproducibility.
[method description] Notation for the residualized outcomes and the unsigned CATE could be introduced earlier and used consistently to improve readability when transitioning from the method to the theoretical results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the theoretical foundations of our model-assisted randomization tests. We address each major comment below and have made revisions to strengthen the presentation of the validity argument and the identification assumptions.

read point-by-point responses

Referee: [theoretical development of validity (likely §3 or §4)] The validity of the CATE-assisted randomization tests (as claimed in the abstract and developed in the theoretical sections) rests on showing that post-hoc sign selection for the unsigned CATE estimator does not distort the reference distribution obtained by re-randomizing assignments. Under the null of no effect heterogeneity the covariance term is zero, yet the sign is chosen using the same residualized outcomes that enter the test statistic; this creates a potential data-dependent dependence that could invalidate exact finite-sample p-value calibration. The manuscript must provide an explicit argument or lemma establishing that the conditional distribution of the statistic given covariates remains correctly calibrated after selection, as this is load-bearing for the central no-sample-splitting claim.

Authors: We agree that an explicit lemma is needed to formalize why post-hoc sign selection preserves exact finite-sample validity. Under the null of no CATE heterogeneity the population covariance is identically zero, so the unsigned estimator is zero with probability 1 and sign selection has no effect on the test statistic. In the revised manuscript we add Lemma 3.2 in Section 3, which shows that the conditional distribution of the assisted test statistic (given covariates and residuals) remains uniform under the randomization distribution after selection. The proof exploits that, under the null, the sign choice is a deterministic function of quantities that are fixed with respect to the re-randomization of assignments. We also include a short proof sketch in the appendix. This directly addresses the potential dependence concern while preserving the no-sample-splitting property. revision: yes
Referee: [identification and consistency results] The identification result for the unsigned CATE estimator from residual covariances (abstract and identification section) needs to be stated with the precise assumptions on the residualization step and the assignment mechanism; if these assumptions are weaker than standard randomization assumptions, the consistency claim for the estimator should be cross-referenced to the simulation designs to confirm they are satisfied in the reported experiments.

Authors: We accept the suggestion to state the assumptions more explicitly. The revised Section 2 now lists the precise conditions: (A1) the assignment mechanism is a known, completely randomized design independent of potential outcomes; (A2) residualization is performed with a fixed, non-random function of covariates only (e.g., OLS projection onto a pre-specified basis). These are standard randomization assumptions and not weaker. We have added a remark immediately after the consistency theorem that cross-references the simulation designs in Section 5, confirming that all synthetic and semi-synthetic DGPs satisfy (A1)–(A2) and therefore the reported consistency and power results are covered by the theory. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation separates covariance-based estimation from assignment-based inference

full rationale

The paper derives the unsigned CATE estimator from the covariance structure of residualized outcomes while explicitly leaving realized assignments untouched for the randomization component. Identification, consistency, and test validity are claimed as separately established results rather than by construction from the same fitted quantities. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided description. The post-hoc sign choice is presented as an auxiliary step whose impact on the exact finite-sample randomization distribution is asserted to be innocuous, without reducing the central claims to tautological inputs. This is the most common honest finding for papers whose core inference mechanism remains externally anchored in the known assignment mechanism.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that residual covariances identify the unsigned CATE magnitude and that sign selection preserves validity. No free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Unsigned CATE is identifiable from the covariance structure of residualized outcomes
This is the key modeling step that allows estimation without using the treatment assignments directly.

pith-pipeline@v0.9.0 · 5458 in / 1365 out tokens · 72235 ms · 2026-05-12T02:10:42.058897+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We develop model-assisted randomization tests... estimate an unsigned version of the conditional average treatment effect (CATE) from the covariance structure of residualized outcomes
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
the off-diagonal residual covariances depend on the CATE vector τ(x) through quadratic forms... τ(x) is identified up to sign

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

2025 , eprint=

An Introduction to Double/Debiased Machine Learning , author=. 2025 , eprint=

work page 2025
[2]

arXiv preprint arXiv:2311.03554 , year=

Conditional Randomization Tests for Behavioral and Neural Time Series , author=. arXiv preprint arXiv:2311.03554 , year=

work page arXiv
[3]

Athey,Susan and Imbens,Guido W. , year=. Design-based analysis in Difference-in-Differences settings with staggered adoption , journal=

work page
[4]

2021 , author =

Difference-in-Differences with multiple time periods , journal =. 2021 , author =

work page 2021
[5]

Double/debiased machine learning for treatment and structural parameters , journal=

Chernozhukov,Victor and Chetverikov,Denis and Demirer,Mert and Duflo,Esther and Hansen,Christian and Newey,Whitney and Robins,James , year=. Double/debiased machine learning for treatment and structural parameters , journal=

work page
[6]

American Economic Review , Volume =

de Chaisemartin, Clément and D'Haultfœuille, Xavier , Title =. American Economic Review , Volume =. 2020 , Pages =

work page 2020
[7]

1935 , publisher=

Design of experiments , author=. 1935 , publisher=

work page 1935
[8]

Econometrica , volume=

Partial time regressions as compared with individual trends , author=. Econometrica , volume=

work page
[9]

Foundations and Trends

Theory of disagreement-based active learning , author=. Foundations and Trends. 2014 , publisher=

work page 2014
[10]

Journal of the American Statistical Association , volume=

Seasonal adjustment of economic time series and multiple regression analysis , author=. Journal of the American Statistical Association , volume=

work page
[11]

Biometrika , volume =

Nie, X and Wager, S , title =. Biometrika , volume =

work page
[12]

The American Journal of Human Genetics , volume=

A note on the calculation of empirical P values from Monte Carlo procedures , author=. The American Journal of Human Genetics , volume=. 2002 , publisher=

work page 2002
[13]

and Rotnitzky,Andrea and Zhao,Lue P

Robins,James M. and Rotnitzky,Andrea and Zhao,Lue P. , year=. Estimation of Regression Coefficients When Some Regressors are not Always Observed , journal=

work page
[14]

Journal of econometrics , volume=

Estimating dynamic treatment effects in event studies with heterogeneous treatment effects , author=. Journal of econometrics , volume=. 2021 , publisher=

work page 2021
[15]

Journal of the American Statistical Association , volume=

What is a randomization test? , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

work page 2023
[16]

2025 , eprint=

Adaptive sample splitting for randomization tests , author=. 2025 , eprint=

work page 2025
[17]

, year =

Rosenbaum, Paul R. , year =. Covariance adjustment in randomized experiments and observational studies , journal =

work page
[18]

A conditional randomization test to account for covariate imbalance in randomized experiments , journal =

Hennessy, Jonathan and Dasgupta, Tirthankar and Miratrix, Luke and Pattanayak, Cassandra and Sarkar, Pradipta , year =. A conditional randomization test to account for covariate imbalance in randomized experiments , journal =

work page
[19]

Covariate-adjusted

Zhao, Anqi and Ding, Peng , year =. Covariate-adjusted. Journal of Econometrics , volume =

work page
[20]

, year =

Robinson, Peter M. , year =. Root-. Econometrica , volume =

work page
[21]

Toward better practice of covariate adjustment in analyzing randomized clinical trials , journal =

Ye, Ting and Shao, Jun and Yi, Yanyao and Zhao, Qingyuan , year =. Toward better practice of covariate adjustment in analyzing randomized clinical trials , journal =

work page
[22]

Multiple conditional randomization tests for lagged and spillover treatment effects , journal =

Zhang, Yao and Zhao, Qingyuan , year =. Multiple conditional randomization tests for lagged and spillover treatment effects , journal =

work page
[23]

Randomization inference for treatment effect variation , journal =

Ding, Peng and Feller, Avi and Miratrix, Luke , year =. Randomization inference for treatment effect variation , journal =

work page
[24]

arXiv preprint arXiv:2501.07722 , year=

Ml-assisted randomization tests for detecting treatment effects in a/b experiments , author=. arXiv preprint arXiv:2501.07722 , year=

work page arXiv
[25]

and Dominici, Francesca , year =

Lee, Kwonsang and Small, Dylan S. and Dominici, Francesca , year =. Discovering heterogeneous exposure effects using randomization inference in air pollution studies , journal =

work page
[26]

Journal of Econometrics , volume =

Goodman-Bacon, Andrew , year =. Journal of Econometrics , volume =

work page
[27]

The Annals of Applied Statistics , pages=

Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , author=. The Annals of Applied Statistics , pages=. 2013 , publisher=

work page 2013
[28]

Journal of the American Statistical Association , volume=

Time series experiments and causal estimands: exact randomization tests and trading , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

work page 2019
[29]

Quantitative Economics , volume=

Panel experiments and dynamic causal effects: A finite population perspective , author=. Quantitative Economics , volume=. 2021 , publisher=

work page 2021
[30]

arXiv preprint arXiv:2510.22864 , year=

Unifying regression-based and design-based causal inference in time-series experiments , author=. arXiv preprint arXiv:2510.22864 , year=

work page arXiv
[31]

Callaway, Brantly and Sant'Anna, Pedro H. C. , year =. Journal of Econometrics , volume =

work page
[32]

Revisiting event-study designs: robust and efficient estimation , journal =

Borusyak, Kirill and Jaravel, Xavier and Spiess, Jann , year =. Revisiting event-study designs: robust and efficient estimation , journal =

work page
[33]

Roth, Jonathan and Sant'Anna, Pedro H. C. and Bilinski, Alyssa and Poe, John , year =. What's trending in. Journal of Econometrics , volume =

work page
[34]

and Larcker, David F

Baker, Andrew C. and Larcker, David F. and Wang, Charles C. Y. , year =. How much should we trust staggered. Journal of Financial Economics , volume =

work page
[35]

Journal of Econometrics , volume=

Causal inference in network experiments: regression-based analysis and design-based properties , author=. Journal of Econometrics , volume=. 2025 , publisher=

work page 2025
[36]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Model-assisted analyses of cluster-randomized experiments , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2021 , publisher=

work page 2021
[37]

Journal of the American Statistical Association , volume=

General forms of finite population central limit theorems with applications to causal inference , author=. Journal of the American Statistical Association , volume=. 2017 , publisher=

work page 2017
[38]

Journal of the American Statistical Association , volume=

The generalized oaxaca-blinder estimator , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

work page 2023
[39]

Biometrika , volume=

No-harm calibration for generalized Oaxaca--Blinder estimators , author=. Biometrika , volume=. 2024 , publisher=

work page 2024
[40]

Political Analysis , author=

On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data , volume=. Political Analysis , author=. 2021 , pages=. doi:10.1017/pan.2020.33 , number=

work page doi:10.1017/pan.2020.33 2021
[41]

IEEE Signal Processing Magazine , volume=

Phase Retrieval with Application to Optical Imaging: A contemporary overview , author=. IEEE Signal Processing Magazine , volume=

work page
[42]

Communications on Pure and Applied Mathematics , volume=

PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , author=. Communications on Pure and Applied Mathematics , volume=

work page
[43]

IEEE Transactions on Information Theory , volume=

Phase Retrieval via Wirtinger Flow: Theory and Algorithms , author=. IEEE Transactions on Information Theory , volume=

work page