pith. machine review for the scientific record. sign in

arxiv: 2605.15115 · v1 · submitted 2026-05-14 · 💰 econ.EM · stat.ME

Recognition: no theorem link

A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects

Authors on Pith no claims yet

Pith reviewed 2026-05-15 02:51 UTC · model grok-4.3

classification 💰 econ.EM stat.ME
keywords instrumental variablesLATEheterogeneous treatment effectscovariatesmonotonicity violationsrobustness checksapplied econometrics
0
0 comments X

The pith

Different specifications for covariates in instrumental variables regressions produce distinct weighted averages of covariate-specific local average treatment effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that including covariates in IV models is not neutral: each common specification implicitly applies a different set of weights to the local average treatment effects that vary with covariates. This means researchers can obtain numerically different results from the same data, each with its own causal meaning. When the functional form for covariates is misspecified, the resulting estimate may no longer have a clear causal interpretation as a weighted LATE. The authors therefore advocate for flexible specifications as robustness checks and review tests for the validity of the LATE assumptions including monotonicity. They also provide guidance on software to implement these methods in practice.

Core claim

Different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. Parametric misspecification can undermine the causal interpretation of these estimands, so flexible specifications are essential robustness checks. The paper also covers formal tests for LATE assumptions and methods robust to monotonicity violations.

What carries the argument

Weighted averages of covariate-specific local average treatment effects (LATEs) under different covariate specifications in IV estimation.

If this is right

  • Different covariate specifications imply different weightings over heterogeneous effects.
  • Misspecified parametric forms for covariates can invalidate the causal claims of the IV estimate.
  • Flexible specifications provide necessary checks on the robustness of the results.
  • Tests for the LATE assumptions and monotonicity-robust methods should be standard practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applied researchers might start reporting the implied weights or the range of estimates across specifications to show sensitivity.
  • These considerations could extend to other estimators that rely on weighting, such as in matching or regression discontinuity designs.
  • Software tools that automatically compare multiple specifications could help standardize this practice.

Load-bearing premise

Researchers will correctly implement the flexible specifications and tests in their applications and that the LATE assumptions hold plausibly in the data.

What would settle it

Finding a dataset where switching from a parametric to a flexible covariate specification changes the IV estimate in a way that cannot be explained by sampling variation alone.

read the original abstract

Instrumental variables (IV) methods are central to applied microeconomics. While classical approaches assume linear models with constant effects, recent literature has shifted toward the local average treatment effect (LATE) framework to accommodate heterogeneous treatment effects. This paper provides a practical guide to aligning empirical practice with recent theory. We first examine how different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. We then discuss how parametric misspecification can undermine the causal interpretation of these estimands and suggest flexible specifications as essential robustness checks. Finally, we review formal tests for LATE assumptions and methods robust to monotonicity violations. We provide a guide to software implementations to help researchers apply the methods in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript provides a practical guide to instrumental variables (IV) methods under heterogeneous treatment effects. It shows that different covariate specifications recover distinct weighted averages of covariate-specific LATEs, warns that parametric misspecification can undermine causal interpretability of the resulting estimands, reviews formal tests for LATE assumptions and approaches robust to monotonicity violations, and supplies software implementation guidance for applied researchers.

Significance. If followed, the recommendations would help applied microeconomists produce more robust and interpretable IV estimates by aligning practice with established LATE weighting results and by encouraging flexible specifications plus formal assumption tests. The paper's value lies in its synthesis of theory into actionable checks rather than in new theoretical results.

minor comments (2)
  1. [Software implementations] The software section would be strengthened by including short, self-contained code snippets (or repository links) that demonstrate the recommended flexible specifications and the reviewed tests for LATE assumptions.
  2. [Covariate specifications] When discussing the weighting of covariate-specific LATEs, an explicit citation to the relevant 2SLS weighting formula (or a brief derivation) would make the exposition self-contained for readers who have not recently consulted the foundational references.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review. The referee's summary accurately reflects the manuscript's focus on synthesizing LATE weighting results, the risks of parametric misspecification, and practical guidance for applied researchers. We appreciate the recognition that the paper's value lies in translating theory into actionable checks rather than new theoretical contributions. We will make the minor revisions needed to strengthen the presentation.

Circularity Check

0 steps flagged

No significant circularity; claims follow from cited external theory

full rationale

The paper is an expository guide whose central claims—that different covariate specifications in IV models recover distinct weighted averages of covariate-specific LATEs and that parametric misspecification can alter causal interpretability—directly restate standard results on 2SLS weighting from the existing LATE literature (e.g., the estimator as an integral over conditional LATEs weighted by conditional first-stage strength). These are presented as reviews of prior theory rather than new derivations internal to the paper. Recommendations for flexible specifications and tests are robustness advice, not fitted predictions or self-definitional constructs. Any self-citations are non-load-bearing and do not reduce the argument to unverified internal inputs. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The guide rests on standard IV and LATE assumptions from prior literature without introducing new fitted parameters or invented entities in the abstract.

axioms (1)
  • domain assumption Standard IV assumptions: relevance, exclusion restriction, and monotonicity for LATE identification
    The discussion of weighted LATEs and tests for assumptions directly invokes these classical conditions.

pith-pipeline@v0.9.0 · 5424 in / 1186 out tokens · 35318 ms · 2026-05-15T02:51:14.635583+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

  1. [1]

    Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics , 113(2):231--263

  2. [2]

    Anatolyev, S. (2019). Many instruments and/or regressors: A friendly guide. Journal of Economic Surveys , 33(2):689--726

  3. [3]

    Andresen, M. E. (2026). montest: T esting LATE assumptions and monotonicity using machine learning. https://github.com/martin-andresen/montest

  4. [4]

    Angrist, J. D. and Imbens, G. W. (1995). Two-stage least squares estimation of average causal effects in models with variable treatment intensity. Journal of the American Statistical Association , 90(430):431--442

  5. [5]

    D., Imbens, G

    Angrist, J. D., Imbens, G. W., and Krueger, A. B. (1999). Jackknife instrumental variables estimation. Journal of Applied Econometrics , 14(1):57--67

  6. [6]

    D., Imbens, G

    Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association , 91(434):444--455

  7. [7]

    Angrist, J. D. and Pischke, J.-S. (2009). Mostly H armless E conometrics: A n E mpiricist's C ompanion . Princeton University Press, Princeton--Oxford

  8. [8]

    Autor, D., Kostøl, A., Mogstad, M., and Setzler, B. (2019). Disability benefits, consumption insurance, and household labor supply. American Economic Review , 109(7):2613--2654

  9. [9]

    and Pearl, J

    Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association , 92(439):1171--1176

  10. [10]

    Belloni, A., Chernozhukov, V., Fernández-Val, I., and Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data. Econometrica , 85(1):233--298

  11. [11]

    Blandhol, C., Bonney, J., Mogstad, M., and Torgovitsky, A. (2026). When is TSLS actually LATE ? Review of Economic Studies , forthcoming

  12. [12]

    Borusyak, K., Hull, P., and Jaravel, X. (2025). A practical guide to shift-share instruments. Journal of Economic Perspectives , 39(1):181--204

  13. [13]

    J., and Vytlacil, E

    Carneiro, P., Heckman, J. J., and Vytlacil, E. J. (2011). Estimating marginal returns to education. American Economic Review , 101(6):2754--2781

  14. [14]

    and Kitagawa, T

    Carr, T. and Kitagawa, T. (2023). Testing instrument validity with covariates. arXiv:2112.08092

  15. [15]

    C., Swanson, N

    Chao, J. C., Swanson, N. R., and Woutersen, T. (2023). Jackknife estimation of a cluster-sample IV regression model with many weak instruments. Journal of Econometrics , 235(2):1747--1769

  16. [16]

    Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal , 21(1):C1--C68

  17. [17]

    Chyn, E., Frandsen, B., and Leslie, E. (2025). Examiner and judge designs in economics: A practitioner's guide. Journal of Economic Literature , 63(2):401--439

  18. [18]

    Currie, J., Kleven, H., and Zwiers, E. (2020). Technology and big data are changing economics: M ining text to track methods. AEA Papers and Proceedings , 110:42--48

  19. [19]

    Dobbie, W., Goldin, J., and Yang, C. S. (2018). The effects of pretrial detention on conviction, future crime, and employment: E vidence from randomly assigned judges. American Economic Review , 108(2):201--240

  20. [20]

    Farbmacher, H., Guber, R., and Klaassen, S. (2022). Instrument validity tests with causal forests. Journal of Business & Economic Statistics , 40(2):605--614

  21. [21]

    P., Allen, H., Baicker, K., and Oregon Health Study Group (2012)

    Finkelstein, A., Taubman, S., Wright, B., Bernstein, M., Gruber, J., Newhouse, J. P., Allen, H., Baicker, K., and Oregon Health Study Group (2012). The O regon H ealth I nsurance E xperiment: E vidence from the first year. Quarterly Journal of Economics , 127(3):1057--1106

  22. [22]

    Fr\" o lich, M. (2007). Nonparametric IV estimation of local average treatment effects with covariates. Journal of Econometrics , 139(1):35--75

  23. [23]

    Goldsmith-Pinkham, P. (2026). Tracking the credibility revolution across fields. NBER Working Paper no. 35051

  24. [24]

    Goldsmith-Pinkham, P., Hull, P., and Koles\' a r, M. (2025). Leniency designs: A n operator's manual. NBER Working Paper no. 34473

  25. [25]

    Heckman, J. J. and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policy evaluation. Econometrica , 73(3):669--738

  26. [26]

    Heiler, P. (2022). Efficient covariate balancing for the local average treatment effect. Journal of Business & Economic Statistics , 40(4):1569--1582

  27. [27]

    and Mellace, G

    Huber, M. and Mellace, G. (2015). Testing instrument validity for LATE identification based on inequality moment constraints. Review of Economics and Statistics , 97(2):398--411

  28. [28]

    Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica , 62(2):467--475

  29. [29]

    Imbens, G. W. and Rubin, D. B. (1997). Estimating outcome distributions for compliers in instrumental variables models. Review of Economic Studies , 64(4):555--574

  30. [30]

    Kitagawa, T. (2015). A test for instrument validity. Econometrica , 83(5):2043--2063

  31. [31]

    Koles\' a r, M. (2013). Estimation in an instrumental variables model with treatment effect heterogeneity. Unpublished

  32. [32]

    and Roth, J

    Kwon, S. and Roth, J. (2026). Testing mechanisms. Review of Economic Studies , forthcoming

  33. [33]

    Ma, Y., Sant'Anna, P. H. C., Sasaki, Y., and Ura, T. (2026). Doubly robust estimators with weak overlap. arXiv:2304.08974

  34. [34]

    J., and Strand, A

    Maestas, N., Mullen, K. J., and Strand, A. (2013). Does disability insurance receipt discourage work? U sing examiner assignment to estimate causal effects of SSDI receipt. American Economic Review , 103(5):1797--1829

  35. [35]

    and Sant'Anna, P

    Mao, M. and Sant'Anna, P. H. C. (2020). Testing instrument validity in marginal treatment effects models. Unpublished

  36. [36]

    and Sun, L

    Mikusheva, A. and Sun, L. (2022). Inference with many weak instruments. Review of Economic Studies , 89(5):2663--2686

  37. [37]

    and Sun, L

    Mikusheva, A. and Sun, L. (2024). Weak identification with many instruments. Econometrics Journal , 27(2):C1--C28

  38. [38]

    and Torgovitsky, A

    Mogstad, M. and Torgovitsky, A. (2018). Identification and extrapolation of causal effects with instrumental variables. Annual Review of Economics , 10:577--613

  39. [39]

    and Torgovitsky, A

    Mogstad, M. and Torgovitsky, A. (2024). Instrumental variables with unobserved heterogeneity in treatment effects. In Dustmann, C. and Lemieux, T., editors, Handbook of Labor Economics, Vol. 5 , pages 1--114. Elsevier, Amsterdam

  40. [40]

    and Wan, Y

    Mourifi\' e , I. and Wan, Y. (2017). Testing local average treatment effect assumptions. Review of Economics and Statistics , 99(2):305--313

  41. [41]

    Papke, L. E. and Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics , 11(6):619--632

  42. [42]

    and S oczy \'n ski, T

    Poirier, A. and S oczy \'n ski, T. (2025). Quantifying the internal validity of weighted estimands. arXiv:2404.14603

  43. [43]

    M., Venti, S

    Poterba, J. M., Venti, S. F., and Wise, D. A. (1994). 401(k) plans and tax-deferred saving. In Wise, D. A., editor, Studies in the Economics of Aging , pages 105--142. University of Chicago Press, Chicago--London

  44. [44]

    Ramsey, J. B. (1969). Tests for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society: Series B , 31(2):350--371

  45. [45]

    Sant'Anna, P. H. C., Song, X., and Xu, Q. (2022). Covariate distribution balance via propensity scores. Journal of Applied Econometrics , 37(6):1093--1120

  46. [46]

    and Sun, L

    Singh, R. and Sun, L. (2024). Double robustness for complier parameters and a semi-parametric test for complier characteristics. Econometrics Journal , 27(1):1--20

  47. [47]

    S oczy \'n ski, T. (2026). When should we (not) interpret linear IV estimands as LATE ? Review of Economic Studies , forthcoming

  48. [48]

    D., and Wooldridge, J

    S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2022). Doubly robust estimation of local average treatment effects using inverse probability weighted regression adjustment. arXiv:2208.01300

  49. [49]

    D., and Wooldridge, J

    S oczy \'n ski, T., Uysal, S. D., and Wooldridge, J. M. (2025). Abadie's kappa and weighting estimators of the local average treatment effect. Journal of Business & Economic Statistics , 43(1):164--177

  50. [50]

    Stevenson, M. T. (2018). Distortion of justice: H ow the inability to pay bail affects case outcomes. Journal of Law, Economics, and Organization , 34(4):511--542

  51. [51]

    and Tan, Z

    Sun, B. and Tan, Z. (2022). High-dimensional model-assisted inference for local average treatment effects with instrumental variables. Journal of Business & Economic Statistics , 40(4):1732--1744

  52. [52]

    Sun, Z. (2023). Instrument validity for heterogeneous causal effects. Journal of Econometrics , 237(2, Part A):105523

  53. [53]

    Tan, Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association , 101(476):1607--1618

  54. [54]

    Uysal, S. D. (2011). Doubly robust IV estimation of the local average treatment effect. Unpublished

  55. [55]

    Wooldridge, J. M. (2010). Econometric A nalysis of C ross S ection and P anel D ata . MIT Press, Cambridge--London, 2nd edition