arxiv: 2605.15115 · v1 · submitted 2026-05-14 · 💰 econ.EM · stat.ME

Recognition: no theorem link

A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects

Tymon S{\l}oczy\'nski , Liyang Sun , S. Derya Uysal

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:51 UTC · model grok-4.3

classification 💰 econ.EM stat.ME

keywords instrumental variablesLATEheterogeneous treatment effectscovariatesmonotonicity violationsrobustness checksapplied econometrics

0 comments

The pith

Different specifications for covariates in instrumental variables regressions produce distinct weighted averages of covariate-specific local average treatment effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that including covariates in IV models is not neutral: each common specification implicitly applies a different set of weights to the local average treatment effects that vary with covariates. This means researchers can obtain numerically different results from the same data, each with its own causal meaning. When the functional form for covariates is misspecified, the resulting estimate may no longer have a clear causal interpretation as a weighted LATE. The authors therefore advocate for flexible specifications as robustness checks and review tests for the validity of the LATE assumptions including monotonicity. They also provide guidance on software to implement these methods in practice.

Core claim

Different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. Parametric misspecification can undermine the causal interpretation of these estimands, so flexible specifications are essential robustness checks. The paper also covers formal tests for LATE assumptions and methods robust to monotonicity violations.

What carries the argument

Weighted averages of covariate-specific local average treatment effects (LATEs) under different covariate specifications in IV estimation.

If this is right

Different covariate specifications imply different weightings over heterogeneous effects.
Misspecified parametric forms for covariates can invalidate the causal claims of the IV estimate.
Flexible specifications provide necessary checks on the robustness of the results.
Tests for the LATE assumptions and monotonicity-robust methods should be standard practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applied researchers might start reporting the implied weights or the range of estimates across specifications to show sensitivity.
These considerations could extend to other estimators that rely on weighting, such as in matching or regression discontinuity designs.
Software tools that automatically compare multiple specifications could help standardize this practice.

Load-bearing premise

Researchers will correctly implement the flexible specifications and tests in their applications and that the LATE assumptions hold plausibly in the data.

What would settle it

Finding a dataset where switching from a parametric to a flexible covariate specification changes the IV estimate in a way that cannot be explained by sampling variation alone.

read the original abstract

Instrumental variables (IV) methods are central to applied microeconomics. While classical approaches assume linear models with constant effects, recent literature has shifted toward the local average treatment effect (LATE) framework to accommodate heterogeneous treatment effects. This paper provides a practical guide to aligning empirical practice with recent theory. We first examine how different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. We then discuss how parametric misspecification can undermine the causal interpretation of these estimands and suggest flexible specifications as essential robustness checks. Finally, we review formal tests for LATE assumptions and methods robust to monotonicity violations. We provide a guide to software implementations to help researchers apply the methods in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful compilation of LATE results for IV practitioners but adds no new theory or evidence.

read the letter

This is a clear but unoriginal guide to IV methods under heterogeneous effects. The authors restate that different covariate specifications in 2SLS recover distinct weighted averages of conditional LATEs, and that misspecification can move the estimand away from something causally meaningful. The paper does well at making these points accessible. It explains how the estimator can be seen as an integral over conditional LATEs with weights based on the conditional first-stage strength. It flags the need for flexible specifications as essential robustness checks and reviews formal tests for LATE assumptions and methods robust to monotonicity violations. The guide to software implementations helps researchers apply these methods in practice. The main limitation is the lack of novelty or new evidence. The central claims align with standard results, as the stress test notes, but the paper adds no simulations to show the practical importance of these issues or new applications. Readers must still judge for themselves whether the underlying assumptions hold in their setting. There is also little discussion of what to do when different specifications give conflicting results. Applied economists running IV regressions would benefit from this as a reference. It is not essential reading for specialists but could improve standard practice. I would send it to peer review as a methods paper.

Referee Report

0 major / 2 minor

Summary. The manuscript provides a practical guide to instrumental variables (IV) methods under heterogeneous treatment effects. It shows that different covariate specifications recover distinct weighted averages of covariate-specific LATEs, warns that parametric misspecification can undermine causal interpretability of the resulting estimands, reviews formal tests for LATE assumptions and approaches robust to monotonicity violations, and supplies software implementation guidance for applied researchers.

Significance. If followed, the recommendations would help applied microeconomists produce more robust and interpretable IV estimates by aligning practice with established LATE weighting results and by encouraging flexible specifications plus formal assumption tests. The paper's value lies in its synthesis of theory into actionable checks rather than in new theoretical results.

minor comments (2)

[Software implementations] The software section would be strengthened by including short, self-contained code snippets (or repository links) that demonstrate the recommended flexible specifications and the reviewed tests for LATE assumptions.
[Covariate specifications] When discussing the weighting of covariate-specific LATEs, an explicit citation to the relevant 2SLS weighting formula (or a brief derivation) would make the exposition self-contained for readers who have not recently consulted the foundational references.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review. The referee's summary accurately reflects the manuscript's focus on synthesizing LATE weighting results, the risks of parametric misspecification, and practical guidance for applied researchers. We appreciate the recognition that the paper's value lies in translating theory into actionable checks rather than new theoretical contributions. We will make the minor revisions needed to strengthen the presentation.

Circularity Check

0 steps flagged

No significant circularity; claims follow from cited external theory

full rationale

The paper is an expository guide whose central claims—that different covariate specifications in IV models recover distinct weighted averages of covariate-specific LATEs and that parametric misspecification can alter causal interpretability—directly restate standard results on 2SLS weighting from the existing LATE literature (e.g., the estimator as an integral over conditional LATEs weighted by conditional first-stage strength). These are presented as reviews of prior theory rather than new derivations internal to the paper. Recommendations for flexible specifications and tests are robustness advice, not fitted predictions or self-definitional constructs. Any self-citations are non-load-bearing and do not reduce the argument to unverified internal inputs. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The guide rests on standard IV and LATE assumptions from prior literature without introducing new fitted parameters or invented entities in the abstract.

axioms (1)

domain assumption Standard IV assumptions: relevance, exclusion restriction, and monotonicity for LATE identification
The discussion of weighted LATEs and tests for assumptions directly invokes these classical conditions.

pith-pipeline@v0.9.0 · 5424 in / 1186 out tokens · 35318 ms · 2026-05-15T02:51:14.635583+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics , 113(2):231--263

work page 2003
[2]

Anatolyev, S. (2019). Many instruments and/or regressors: A friendly guide. Journal of Economic Surveys , 33(2):689--726

work page 2019
[3]

Andresen, M. E. (2026). montest: T esting LATE assumptions and monotonicity using machine learning. https://github.com/martin-andresen/montest

work page 2026
[4]

Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association , 90(430):431--442

work page 1995
[5]

D., Imbens, G

Angrist, J. D., Imbens, G. W., and Krueger, A. B. (1999). Jackknife instrumental variables estimation. Journal of Applied Econometrics , 14(1):57--67

work page 1999
[6]

D., Imbens, G

Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association , 91(434):444--455

work page 1996
[7]

Angrist, J. D. and Pischke, J.-S. (2009). Mostly H armless E conometrics: A n E mpiricist's C ompanion . Princeton University Press, Princeton--Oxford

work page 2009
[8]

Autor, D., Kostøl, A., Mogstad, M., and Setzler, B. (2019). Disability benefits, consumption insurance, and household labor supply. American Economic Review , 109(7):2613--2654

work page 2019
[9]

and Pearl, J

Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association , 92(439):1171--1176

work page 1997
[10]

Belloni, A., Chernozhukov, V., Fernández-Val, I., and Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data. Econometrica , 85(1):233--298

work page 2017
[11]

Blandhol, C., Bonney, J., Mogstad, M., and Torgovitsky, A. (2026). When is TSLS actually LATE ? Review of Economic Studies , forthcoming

work page 2026
[12]

Borusyak, K., Hull, P., and Jaravel, X. (2025). A practical guide to shift-share instruments. Journal of Economic Perspectives , 39(1):181--204

work page 2025
[13]

J., and Vytlacil, E

Carneiro, P., Heckman, J. J., and Vytlacil, E. J. (2011). Estimating marginal returns to education. American Economic Review , 101(6):2754--2781

work page 2011
[14]

and Kitagawa, T

Carr, T. and Kitagawa, T. (2023). Testing instrument validity with covariates. arXiv:2112.08092

work page arXiv 2023
[15]

C., Swanson, N

Chao, J. C., Swanson, N. R., and Woutersen, T. (2023). Jackknife estimation of a cluster-sample IV regression model with many weak instruments. Journal of Econometrics , 235(2):1747--1769

work page 2023
[16]

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal , 21(1):C1--C68

work page 2018
[17]

Chyn, E., Frandsen, B., and Leslie, E. (2025). Examiner and judge designs in economics: A practitioner's guide. Journal of Economic Literature , 63(2):401--439

work page 2025
[18]

Currie, J., Kleven, H., and Zwiers, E. (2020). Technology and big data are changing economics: M ining text to track methods. AEA Papers and Proceedings , 110:42--48

work page 2020
[19]

Dobbie, W., Goldin, J., and Yang, C. S. (2018). The effects of pretrial detention on conviction, future crime, and employment: E vidence from randomly assigned judges. American Economic Review , 108(2):201--240

work page 2018
[20]

Farbmacher, H., Guber, R., and Klaassen, S. (2022). Instrument validity tests with causal forests. Journal of Business & Economic Statistics , 40(2):605--614

work page 2022
[21]

P., Allen, H., Baicker, K., and Oregon Health Study Group (2012)

Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., and Oregon Health Study Group (2012). The O regon H ealth I nsurance E xperiment: E vidence from the first year. Quarterly Journal of Economics , 127(3):1057--1106

work page 2012
[22]

Fr\" o lich, M. (2007). Nonparametric IV estimation of local average treatment effects with covariates. Journal of Econometrics , 139(1):35--75

work page 2007
[23]

Goldsmith-Pinkham, P. (2026). Tracking the credibility revolution across fields. NBER Working Paper no. 35051

work page 2026
[24]

Goldsmith-Pinkham, P., Hull, P., and Koles\' a r, M. (2025). Leniency designs: A n operator's manual. NBER Working Paper no. 34473

work page 2025
[25]

Heckman, J. J. and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policy evaluation. Econometrica , 73(3):669--738

work page 2005
[26]

Heiler, P. (2022). Efficient covariate balancing for the local average treatment effect. Journal of Business & Economic Statistics , 40(4):1569--1582

work page 2022
[27]

and Mellace, G

Huber, M. and Mellace, G. (2015). Testing instrument validity for LATE identification based on inequality moment constraints. Review of Economics and Statistics , 97(2):398--411

work page 2015
[28]

Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica , 62(2):467--475

work page 1994
[29]

Imbens, G. W. and Rubin, D. B. (1997). Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies , 64(4):555--574

work page 1997
[30]

Kitagawa, T. (2015). A test for instrument validity. Econometrica , 83(5):2043--2063

work page 2015
[31]

Koles\' a r, M. (2013). Estimation in an instrumental variables model with treatment effect heterogeneity. Unpublished

work page 2013
[32]

and Roth, J

Kwon, S. and Roth, J. (2026). Testing mechanisms. Review of Economic Studies , forthcoming

work page 2026
[33]

Ma, Y., Sant'Anna, P. H. C., Sasaki, Y., and Ura, T. (2026). Doubly robust estimators with weak overlap. arXiv:2304.08974

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

J., and Strand, A

Maestas, N., Mullen, K. J., and Strand, A. (2013). Does disability insurance receipt discourage work? U sing examiner assignment to estimate causal effects of SSDI receipt. American Economic Review , 103(5):1797--1829

work page 2013
[35]

and Sant'Anna, P

Mao, M. and Sant'Anna, P. H. C. (2020). Testing instrument validity in marginal treatment effects models. Unpublished

work page 2020
[36]

and Sun, L

Mikusheva, A. and Sun, L. (2022). Inference with many weak instruments. Review of Economic Studies , 89(5):2663--2686

work page 2022
[37]

and Sun, L

Mikusheva, A. and Sun, L. (2024). Weak identification with many instruments. Econometrics Journal , 27(2):C1--C28

work page 2024
[38]

and Torgovitsky, A

Mogstad, M. and Torgovitsky, A. (2018). Identification and extrapolation of causal effects with instrumental variables. Annual Review of Economics , 10:577--613

work page 2018
[39]

and Torgovitsky, A

Mogstad, M. and Torgovitsky, A. (2024). Instrumental variables with unobserved heterogeneity in treatment effects. In Dustmann, C. and Lemieux, T., editors, Handbook of Labor Economics, Vol. 5 , pages 1--114. Elsevier, Amsterdam

work page 2024
[40]

and Wan, Y

Mourifi\' e , I. and Wan, Y. (2017). Testing local average treatment effect assumptions. Review of Economics and Statistics , 99(2):305--313

work page 2017
[41]

Papke, L. E. and Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics , 11(6):619--632

work page 1996
[42]

and S oczy \'n ski, T

Poirier, A. and S oczy \'n ski, T. (2025). Quantifying the internal validity of weighted estimands. arXiv:2404.14603

work page arXiv 2025
[43]

M., Venti, S

Poterba, J. M., Venti, S. F., and Wise, D. A. (1994). 401(k) plans and tax-deferred saving. In Wise, D. A., editor, Studies in the Economics of Aging , pages 105--142. University of Chicago Press, Chicago--London

work page 1994
[44]

Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society: Series B , 31(2):350--371

work page 1969
[45]

Sant'Anna, P. H. C., Song, X., and Xu, Q. (2022). Covariate distribution balance via propensity scores. Journal of Applied Econometrics , 37(6):1093--1120

work page 2022
[46]

and Sun, L

Singh, R. and Sun, L. (2024). Double robustness for complier parameters and a semi-parametric test for complier characteristics. Econometrics Journal , 27(1):1--20

work page 2024
[47]

S oczy \'n ski, T. (2026). When should we (not) interpret linear IV estimands as LATE ? Review of Economic Studies , forthcoming

work page 2026
[48]

D., and Wooldridge, J

S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2022). Doubly robust estimation of local average treatment effects using inverse probability weighted regression adjustment. arXiv:2208.01300

work page arXiv 2022
[49]

D., and Wooldridge, J

S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2025). Abadie's kappa and weighting estimators of the local average treatment effect. Journal of Business & Economic Statistics , 43(1):164--177

work page 2025
[50]

Stevenson, M. T. (2018). Distortion of justice: H ow the inability to pay bail affects case outcomes. Journal of Law, Economics, and Organization , 34(4):511--542

work page 2018
[51]

and Tan, Z

Sun, B. and Tan, Z. (2022). High-dimensional model-assisted inference for local average treatment effects with instrumental variables. Journal of Business & Economic Statistics , 40(4):1732--1744

work page 2022
[52]

Sun, Z. (2023). Instrument validity for heterogeneous causal effects. Journal of Econometrics , 237(2, Part A):105523

work page 2023
[53]

Tan, Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association , 101(476):1607--1618

work page 2006
[54]

Uysal, S. D. (2011). Doubly robust IV estimation of the local average treatment effect. Unpublished

work page 2011
[55]

Wooldridge, J. M. (2010). Econometric A nalysis of C ross S ection and P anel D ata . MIT Press, Cambridge--London, 2nd edition

work page 2010