Recognition: no theorem link
A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects
Pith reviewed 2026-05-15 02:51 UTC · model grok-4.3
The pith
Different specifications for covariates in instrumental variables regressions produce distinct weighted averages of covariate-specific local average treatment effects.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. Parametric misspecification can undermine the causal interpretation of these estimands, so flexible specifications are essential robustness checks. The paper also covers formal tests for LATE assumptions and methods robust to monotonicity violations.
What carries the argument
Weighted averages of covariate-specific local average treatment effects (LATEs) under different covariate specifications in IV estimation.
If this is right
- Different covariate specifications imply different weightings over heterogeneous effects.
- Misspecified parametric forms for covariates can invalidate the causal claims of the IV estimate.
- Flexible specifications provide necessary checks on the robustness of the results.
- Tests for the LATE assumptions and monotonicity-robust methods should be standard practice.
Where Pith is reading between the lines
- Applied researchers might start reporting the implied weights or the range of estimates across specifications to show sensitivity.
- These considerations could extend to other estimators that rely on weighting, such as in matching or regression discontinuity designs.
- Software tools that automatically compare multiple specifications could help standardize this practice.
Load-bearing premise
Researchers will correctly implement the flexible specifications and tests in their applications and that the LATE assumptions hold plausibly in the data.
What would settle it
Finding a dataset where switching from a parametric to a flexible covariate specification changes the IV estimate in a way that cannot be explained by sampling variation alone.
read the original abstract
Instrumental variables (IV) methods are central to applied microeconomics. While classical approaches assume linear models with constant effects, recent literature has shifted toward the local average treatment effect (LATE) framework to accommodate heterogeneous treatment effects. This paper provides a practical guide to aligning empirical practice with recent theory. We first examine how different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. We then discuss how parametric misspecification can undermine the causal interpretation of these estimands and suggest flexible specifications as essential robustness checks. Finally, we review formal tests for LATE assumptions and methods robust to monotonicity violations. We provide a guide to software implementations to help researchers apply the methods in practice.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript provides a practical guide to instrumental variables (IV) methods under heterogeneous treatment effects. It shows that different covariate specifications recover distinct weighted averages of covariate-specific LATEs, warns that parametric misspecification can undermine causal interpretability of the resulting estimands, reviews formal tests for LATE assumptions and approaches robust to monotonicity violations, and supplies software implementation guidance for applied researchers.
Significance. If followed, the recommendations would help applied microeconomists produce more robust and interpretable IV estimates by aligning practice with established LATE weighting results and by encouraging flexible specifications plus formal assumption tests. The paper's value lies in its synthesis of theory into actionable checks rather than in new theoretical results.
minor comments (2)
- [Software implementations] The software section would be strengthened by including short, self-contained code snippets (or repository links) that demonstrate the recommended flexible specifications and the reviewed tests for LATE assumptions.
- [Covariate specifications] When discussing the weighting of covariate-specific LATEs, an explicit citation to the relevant 2SLS weighting formula (or a brief derivation) would make the exposition self-contained for readers who have not recently consulted the foundational references.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive review. The referee's summary accurately reflects the manuscript's focus on synthesizing LATE weighting results, the risks of parametric misspecification, and practical guidance for applied researchers. We appreciate the recognition that the paper's value lies in translating theory into actionable checks rather than new theoretical contributions. We will make the minor revisions needed to strengthen the presentation.
Circularity Check
No significant circularity; claims follow from cited external theory
full rationale
The paper is an expository guide whose central claims—that different covariate specifications in IV models recover distinct weighted averages of covariate-specific LATEs and that parametric misspecification can alter causal interpretability—directly restate standard results on 2SLS weighting from the existing LATE literature (e.g., the estimator as an integral over conditional LATEs weighted by conditional first-stage strength). These are presented as reviews of prior theory rather than new derivations internal to the paper. Recommendations for flexible specifications and tests are robustness advice, not fitted predictions or self-definitional constructs. Any self-citations are non-load-bearing and do not reduce the argument to unverified internal inputs. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard IV assumptions: relevance, exclusion restriction, and monotonicity for LATE identification
Reference graph
Works this paper leans on
-
[1]
Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics , 113(2):231--263
work page 2003
-
[2]
Anatolyev, S. (2019). Many instruments and/or regressors: A friendly guide. Journal of Economic Surveys , 33(2):689--726
work page 2019
-
[3]
Andresen, M. E. (2026). montest: T esting LATE assumptions and monotonicity using machine learning. https://github.com/martin-andresen/montest
work page 2026
-
[4]
Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association , 90(430):431--442
work page 1995
-
[5]
Angrist, J. D., Imbens, G. W., and Krueger, A. B. (1999). Jackknife instrumental variables estimation. Journal of Applied Econometrics , 14(1):57--67
work page 1999
-
[6]
Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association , 91(434):444--455
work page 1996
-
[7]
Angrist, J. D. and Pischke, J.-S. (2009). Mostly H armless E conometrics: A n E mpiricist's C ompanion . Princeton University Press, Princeton--Oxford
work page 2009
-
[8]
Autor, D., Kostøl, A., Mogstad, M., and Setzler, B. (2019). Disability benefits, consumption insurance, and household labor supply. American Economic Review , 109(7):2613--2654
work page 2019
-
[9]
Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association , 92(439):1171--1176
work page 1997
-
[10]
Belloni, A., Chernozhukov, V., Fernández-Val, I., and Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data. Econometrica , 85(1):233--298
work page 2017
-
[11]
Blandhol, C., Bonney, J., Mogstad, M., and Torgovitsky, A. (2026). When is TSLS actually LATE ? Review of Economic Studies , forthcoming
work page 2026
-
[12]
Borusyak, K., Hull, P., and Jaravel, X. (2025). A practical guide to shift-share instruments. Journal of Economic Perspectives , 39(1):181--204
work page 2025
-
[13]
Carneiro, P., Heckman, J. J., and Vytlacil, E. J. (2011). Estimating marginal returns to education. American Economic Review , 101(6):2754--2781
work page 2011
-
[14]
Carr, T. and Kitagawa, T. (2023). Testing instrument validity with covariates. arXiv:2112.08092
-
[15]
Chao, J. C., Swanson, N. R., and Woutersen, T. (2023). Jackknife estimation of a cluster-sample IV regression model with many weak instruments. Journal of Econometrics , 235(2):1747--1769
work page 2023
-
[16]
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal , 21(1):C1--C68
work page 2018
-
[17]
Chyn, E., Frandsen, B., and Leslie, E. (2025). Examiner and judge designs in economics: A practitioner's guide. Journal of Economic Literature , 63(2):401--439
work page 2025
-
[18]
Currie, J., Kleven, H., and Zwiers, E. (2020). Technology and big data are changing economics: M ining text to track methods. AEA Papers and Proceedings , 110:42--48
work page 2020
-
[19]
Dobbie, W., Goldin, J., and Yang, C. S. (2018). The effects of pretrial detention on conviction, future crime, and employment: E vidence from randomly assigned judges. American Economic Review , 108(2):201--240
work page 2018
-
[20]
Farbmacher, H., Guber, R., and Klaassen, S. (2022). Instrument validity tests with causal forests. Journal of Business & Economic Statistics , 40(2):605--614
work page 2022
-
[21]
P., Allen, H., Baicker, K., and Oregon Health Study Group (2012)
Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., and Oregon Health Study Group (2012). The O regon H ealth I nsurance E xperiment: E vidence from the first year. Quarterly Journal of Economics , 127(3):1057--1106
work page 2012
-
[22]
Fr\" o lich, M. (2007). Nonparametric IV estimation of local average treatment effects with covariates. Journal of Econometrics , 139(1):35--75
work page 2007
-
[23]
Goldsmith-Pinkham, P. (2026). Tracking the credibility revolution across fields. NBER Working Paper no. 35051
work page 2026
-
[24]
Goldsmith-Pinkham, P., Hull, P., and Koles\' a r, M. (2025). Leniency designs: A n operator's manual. NBER Working Paper no. 34473
work page 2025
-
[25]
Heckman, J. J. and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policy evaluation. Econometrica , 73(3):669--738
work page 2005
-
[26]
Heiler, P. (2022). Efficient covariate balancing for the local average treatment effect. Journal of Business & Economic Statistics , 40(4):1569--1582
work page 2022
-
[27]
Huber, M. and Mellace, G. (2015). Testing instrument validity for LATE identification based on inequality moment constraints. Review of Economics and Statistics , 97(2):398--411
work page 2015
-
[28]
Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica , 62(2):467--475
work page 1994
-
[29]
Imbens, G. W. and Rubin, D. B. (1997). Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies , 64(4):555--574
work page 1997
-
[30]
Kitagawa, T. (2015). A test for instrument validity. Econometrica , 83(5):2043--2063
work page 2015
-
[31]
Koles\' a r, M. (2013). Estimation in an instrumental variables model with treatment effect heterogeneity. Unpublished
work page 2013
-
[32]
Kwon, S. and Roth, J. (2026). Testing mechanisms. Review of Economic Studies , forthcoming
work page 2026
-
[33]
Ma, Y., Sant'Anna, P. H. C., Sasaki, Y., and Ura, T. (2026). Doubly robust estimators with weak overlap. arXiv:2304.08974
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
Maestas, N., Mullen, K. J., and Strand, A. (2013). Does disability insurance receipt discourage work? U sing examiner assignment to estimate causal effects of SSDI receipt. American Economic Review , 103(5):1797--1829
work page 2013
-
[35]
Mao, M. and Sant'Anna, P. H. C. (2020). Testing instrument validity in marginal treatment effects models. Unpublished
work page 2020
-
[36]
Mikusheva, A. and Sun, L. (2022). Inference with many weak instruments. Review of Economic Studies , 89(5):2663--2686
work page 2022
-
[37]
Mikusheva, A. and Sun, L. (2024). Weak identification with many instruments. Econometrics Journal , 27(2):C1--C28
work page 2024
-
[38]
Mogstad, M. and Torgovitsky, A. (2018). Identification and extrapolation of causal effects with instrumental variables. Annual Review of Economics , 10:577--613
work page 2018
-
[39]
Mogstad, M. and Torgovitsky, A. (2024). Instrumental variables with unobserved heterogeneity in treatment effects. In Dustmann, C. and Lemieux, T., editors, Handbook of Labor Economics, Vol. 5 , pages 1--114. Elsevier, Amsterdam
work page 2024
-
[40]
Mourifi\' e , I. and Wan, Y. (2017). Testing local average treatment effect assumptions. Review of Economics and Statistics , 99(2):305--313
work page 2017
-
[41]
Papke, L. E. and Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics , 11(6):619--632
work page 1996
-
[42]
Poirier, A. and S oczy \'n ski, T. (2025). Quantifying the internal validity of weighted estimands. arXiv:2404.14603
-
[43]
Poterba, J. M., Venti, S. F., and Wise, D. A. (1994). 401(k) plans and tax-deferred saving. In Wise, D. A., editor, Studies in the Economics of Aging , pages 105--142. University of Chicago Press, Chicago--London
work page 1994
-
[44]
Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society: Series B , 31(2):350--371
work page 1969
-
[45]
Sant'Anna, P. H. C., Song, X., and Xu, Q. (2022). Covariate distribution balance via propensity scores. Journal of Applied Econometrics , 37(6):1093--1120
work page 2022
-
[46]
Singh, R. and Sun, L. (2024). Double robustness for complier parameters and a semi-parametric test for complier characteristics. Econometrics Journal , 27(1):1--20
work page 2024
-
[47]
S oczy \'n ski, T. (2026). When should we (not) interpret linear IV estimands as LATE ? Review of Economic Studies , forthcoming
work page 2026
-
[48]
S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2022). Doubly robust estimation of local average treatment effects using inverse probability weighted regression adjustment. arXiv:2208.01300
-
[49]
S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2025). Abadie's kappa and weighting estimators of the local average treatment effect. Journal of Business & Economic Statistics , 43(1):164--177
work page 2025
-
[50]
Stevenson, M. T. (2018). Distortion of justice: H ow the inability to pay bail affects case outcomes. Journal of Law, Economics, and Organization , 34(4):511--542
work page 2018
-
[51]
Sun, B. and Tan, Z. (2022). High-dimensional model-assisted inference for local average treatment effects with instrumental variables. Journal of Business & Economic Statistics , 40(4):1732--1744
work page 2022
-
[52]
Sun, Z. (2023). Instrument validity for heterogeneous causal effects. Journal of Econometrics , 237(2, Part A):105523
work page 2023
-
[53]
Tan, Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association , 101(476):1607--1618
work page 2006
-
[54]
Uysal, S. D. (2011). Doubly robust IV estimation of the local average treatment effect. Unpublished
work page 2011
-
[55]
Wooldridge, J. M. (2010). Econometric A nalysis of C ross S ection and P anel D ata . MIT Press, Cambridge--London, 2nd edition
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.