arxiv: 2604.16690 · v1 · submitted 2026-04-17 · 💰 econ.EM

Recognition: unknown

Integrating Diagnostic Checks into Estimation

Reca Sarfati, Vod Vilfort

Pith reviewed 2026-05-10 06:44 UTC · model grok-4.3

classification 💰 econ.EM

keywords residualizationdiagnostic checksselective reportingestimationvariance reductionmisspecificationRCTpre-trends

0 comments

The pith

Residualizing an estimator against its diagnostic check statistics removes selective reporting distortions, cuts variance when the model is correct, and controls worst-case bias under local misspecification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that researchers should integrate diagnostic checks, such as balance tests or pre-trend checks, directly into estimation rather than treating them as separate validation steps. By residualizing the baseline estimator on the vector of diagnostic statistics, the resulting estimator removes the component of sampling variation that is predictable from the checks. This single adjustment simultaneously corrects for the bias that arises when researchers report results only after favorable checks, lowers variance without altering the target parameter if the original model holds, and achieves the smallest possible worst-case bias among linear adjustments when misspecification is small and bounded. Applied to a recent RCT, the procedure increased the point estimate while shrinking its standard error by an amount equivalent to a 10 percent larger sample.

Core claim

The residualized estimator is formed by regressing the baseline estimator on the full vector of diagnostic check statistics and retaining the residual. When the baseline model is correctly specified, this residualization leaves the estimand unchanged while strictly reducing variance. When the baseline model is locally misspecified, the same linear adjustment minimizes the worst-case bias within the class of all linear adjustments to the estimator. In addition, the procedure eliminates the inferential distortions that selective reporting of results conditional on passing diagnostic checks would otherwise produce.

What carries the argument

The residualized estimator obtained by projecting the baseline estimator orthogonal to the vector of diagnostic check statistics.

If this is right

Inference based on the residualized estimator remains valid even when researchers decide which results to report after seeing the diagnostic checks.
When the baseline specification is correct, the residualized estimator is more precise than the original without any change in the parameter being estimated.
Under bounded local misspecification the residualized estimator achieves the lowest possible worst-case bias among all linear adjustments to the baseline estimator.
In the Kaur et al. (2024) RCT reanalysis the procedure produced a larger point estimate and a standard error reduction equivalent to a 10 percent increase in sample size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be extended to settings where researchers choose among multiple candidate specifications by using the same residualization step to protect against post-selection inference.
Because the adjustment is linear and data-driven, it may be possible to derive analytic standard errors that account for the estimation of the residualization coefficients themselves.
In designs with many diagnostic checks the method implicitly performs a form of dimension reduction that could be useful for high-dimensional balance or validity testing.

Load-bearing premise

Misspecification must be local and bounded so that a linear adjustment to the estimator suffices to control worst-case bias, and the diagnostic statistics themselves must remain uncontaminated by the selection or reporting process they are intended to correct.

What would settle it

Simulate data under a known data-generating process, apply selective reporting that conditions on diagnostic checks passing, and check whether the residualized estimator recovers nominal coverage rates and lower mean-squared error than the unadjusted estimator.

read the original abstract

Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one's baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a "free lunch," delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standard error, equivalent to approximately a 10% increase in sample size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Residualizing the estimator on pre-specified diagnostics is a clean linear adjustment that can cut variance and may help with local bias, but the selective-reporting fix probably breaks if the vector of checks is chosen after seeing the data.

read the letter

The paper's main move is to take your usual estimator and subtract its projection onto the vector of diagnostic statistics, like balance tests or pre-trend coefficients. This is supposed to give three things at once: purge the part of sampling variation tied to which checks pass, shrink variance when the baseline model is correct, and bound worst-case bias under local misspecification inside the linear-adjustment class. The Kaur RCT example shows the point estimate shifting and the standard error falling enough to look like a 10 percent effective-sample-size gain, even though the original balances already looked fine. That numerical illustration is the clearest part of the write-up so far. The construction itself is new in trying to hit all three goals with one residualization step rather than separate fixes. The soft spot is exactly the one the stress-test note flags. The argument for removing selective-reporting distortions treats the diagnostic vector as fixed and exogenous. In practice researchers often decide which checks to compute or report after looking at the main results, so the vector is itself data-dependent. Nothing in the setup appears to correct for that outer layer of selection, which means the claimed orthogonality may not hold and the inference distortion can remain. The bounded-local-misspecification assumption also feels narrow for many applied settings where misspecification is neither small nor well-localized. Without the full derivations and replication code it is hard to see how tightly the three properties are linked or whether they trade off in finite samples. This is written for applied econometricians who already run balance or validity checks and want a mechanical way to fold them into the estimator. A reader who values transparent, pre-specified adjustments might get something out of it, but anyone worried about post-hoc choice of diagnostics will need more reassurance. I would send it to peer review because the idea is straightforward enough to evaluate and the RCT illustration gives referees something concrete to check, but the referees should be asked to focus on the data-dependent selection issue and to verify the claimed guarantees with the actual proofs and code.

Referee Report

3 major / 2 minor

Summary. The paper proposes integrating diagnostic checks (e.g., balance tests in RCTs, pre-trends in event studies) into estimation by residualizing the baseline estimator against the vector of diagnostic statistics. This yields a residualized estimator claimed to simultaneously (i) eliminate inference distortions arising from check-based selective reporting, (ii) reduce variance without changing the estimand under correct specification of the baseline model, and (iii) minimize worst-case bias under bounded local misspecification within the class of linear adjustments. An application to the Kaur et al. (2024) RCT reports that residualization increases the point-estimate magnitude and reduces its standard error, equivalent to a roughly 10 percent effective-sample-size gain even though all balance checks pass.

Significance. If the three properties can be formally established, the approach would offer a practically valuable way to convert diagnostic checks from external validation steps into an integral part of estimation, potentially improving efficiency and robustness in common empirical designs. The reported numerical improvement in a real RCT illustrates possible gains even in well-specified settings.

major comments (3)

[Abstract] Abstract and method description: the central claim that residualization eliminates inference distortions from selective reporting (property i) assumes the diagnostic vector is fixed and exogenous to data-dependent reporting decisions. No derivation or formal argument is supplied showing that the orthogonality condition survives when researchers choose which checks to compute and report after inspecting results; this selection step is load-bearing for property (i) and must be addressed explicitly.
[Method] Method section: the three properties are asserted without the explicit definition of the residualized estimator, the orthogonality conditions, or the steps establishing variance reduction (ii) and minimax bias (iii). For example, the claim that the adjustment is parameter-free and does not alter the estimand under correct specification requires an equation-level derivation; its absence prevents verification that the construction is not circular.
[Application] Application section: the 10 percent effective-sample-size gain is reported for the Kaur et al. (2024) RCT, yet neither the underlying data, code, nor the precise residualization steps are provided. Without these, it is impossible to confirm that the reported improvement is robust or that it arises from the claimed mechanism rather than from the particular sample.

minor comments (2)

[Abstract] Clarify whether the diagnostic vector must be pre-registered or can be chosen from a fixed menu; the current verbal description leaves this ambiguous.
[Abstract] The phrase 'free lunch' should be qualified to indicate the assumptions (local misspecification, pre-specified diagnostics) under which the three properties hold simultaneously.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract and method description: the central claim that residualization eliminates inference distortions from selective reporting (property i) assumes the diagnostic vector is fixed and exogenous to data-dependent reporting decisions. No derivation or formal argument is supplied showing that the orthogonality condition survives when researchers choose which checks to compute and report after inspecting results; this selection step is load-bearing for property (i) and must be addressed explicitly.

Authors: We appreciate the referee pointing out the need for a formal treatment of data-dependent selection of diagnostic checks. The manuscript currently states the property at a conceptual level. In the revision, we will add a dedicated subsection deriving that the residualized estimator eliminates inference distortions even when the set of reported checks is chosen after data inspection. The argument will show that by residualizing against the actually reported diagnostics, the estimator is made orthogonal to the selection rule, thereby removing the bias in inference that arises from selective reporting. We will provide the necessary mathematical steps to establish this. revision: yes
Referee: [Method] Method section: the three properties are asserted without the explicit definition of the residualized estimator, the orthogonality conditions, or the steps establishing variance reduction (ii) and minimax bias (iii). For example, the claim that the adjustment is parameter-free and does not alter the estimand under correct specification requires an equation-level derivation; its absence prevents verification that the construction is not circular.

Authors: We agree that explicit definitions and derivations are essential for rigor. The revised manuscript will begin the method section with the formal definition of the residualized estimator: let hat theta be the baseline estimator and D the vector of diagnostic statistics; the residualized estimator is hat theta^res = hat theta - hat lambda' D, where hat lambda is the coefficient from regressing hat theta on D. We will then derive the three properties step by step. For (ii), under correct specification, E[D] = 0 and the adjustment is mean-zero, preserving the estimand while reducing variance by the explained component. For (iii), we will show it achieves the minimax bias among linear adjustments under local misspecification bounded by a constant. These derivations will be presented with all intermediate equations to avoid any appearance of circularity. revision: yes
Referee: [Application] Application section: the 10 percent effective-sample-size gain is reported for the Kaur et al. (2024) RCT, yet neither the underlying data, code, nor the precise residualization steps are provided. Without these, it is impossible to confirm that the reported improvement is robust or that it arises from the claimed mechanism rather than from the particular sample.

Authors: We acknowledge this limitation in the current draft. Upon revision, we will make available a complete replication package including the code used to implement the residualization on the Kaur et al. (2024) data, the exact steps followed, and either the processed data or clear instructions for obtaining the original data. This will enable independent verification of the approximately 10% effective sample size gain and confirm that it results from the residualization procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: residualization properties follow from linear projection without reducing to input tautology

full rationale

The paper defines the residualized estimator explicitly as the baseline estimator minus its linear projection onto the vector of diagnostic check statistics. The three claimed properties are presented as consequences of this construction: orthogonality to the diagnostics removes dependence on selection rules that are functions of those diagnostics; under correct specification the projection term has zero expectation so the estimand is unchanged while variance falls; and within the linear-adjustment class the projection minimizes worst-case bias under local misspecification. None of these steps substitutes a fitted parameter for a prediction, renames a known result, or relies on a self-citation whose content is itself unverified. The derivation remains self-contained once the diagnostics are treated as a fixed, pre-specified vector; any practical difficulty arising from data-dependent choice of which checks to include is a question of implementation scope rather than an internal reduction of the claimed results to their own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on standard linear-projection and OLS properties together with the assumption that misspecification is local and bounded; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Linear projection theorems and OLS algebra hold for the estimator and the vector of diagnostic statistics.
The residualization step is a linear adjustment whose properties follow from standard projection results.
domain assumption Misspecification, if present, is local and bounded so that the worst-case bias is attained within the class of linear adjustments.
This assumption is required for property (iii) to hold.

pith-pipeline@v0.9.0 · 5484 in / 1369 out tokens · 47295 ms · 2026-05-10T06:44:14.326972+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Adusumilli, K. (2026). You’ve got to be efficient: Ambiguity, misspecification and variational preferences. arXiv preprint arXiv:2604.05327. Andrews, I., Chen, J., and Tecchio, O. (2025). The purpose of an estimator is what it does: Misspecification, estimands, and over-identification.arXiv preprint arXiv:2508.13076. Andrews, I., Gentzkow, M., and Shapiro...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Optimal inference after model selection.arXiv preprint arXiv:1410.2597, 2014

Springer. Bilinski, A. and Hatfield, L. A. (2026). Nothing to see here? a non-inferiority approach to parallel trends.Statistics in Medicine, 45(3-5):e70296. Bonhomme, S. and Weidner, M. (2022). Minimizing sensitivity to model misspecification. Quantitative Economics, 13(3):907–954. Borusyak, K., Jaravel, X., and Spiess, J. (2024). Revisiting event-study ...

work page arXiv 2026
[3]

Next, writeˆcL = ˆcS−ˆβ′ Lˆγand treatβL as fixed at the probability limit of ˆβL (this suffices for first-order variance comparisons). Then Var(ˆcL)=Var(ˆcS−β′ Lˆγ) =σ2 S−2β′ LΣγcS +β′ LΣγγβL = ( σ2 S−ΣcSγΣ−1 γγΣγcS )    Var(ˆcR) +(βL−βR)′Σγγ(βL−βR) ≥Var(ˆcR), whereβR =Σ−1 γγΣγcS, and the last inequality usesΣγγ≻0. Equality holds iffβL =βR. ■ Proof of...

2000