arxiv: 2604.15437 · v1 · submitted 2026-04-16 · 💰 econ.EM

Recognition: unknown

Jackknife Instrumental Variable Inference

Federico Crudu , Giovanni Mellace , Zsolt S\'andor

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:25 UTC · model grok-4.3

classification 💰 econ.EM

keywords jackknifeinstrumental variablesweak instrumentsheteroskedasticityhypothesis testingendogeneitylinear regression

0 comments

The pith

Jackknife-based tests for IV models with many weak instruments reach chi-square limits after a modification to the objective function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a class of jackknife test statistics for linear regression models that have endogenous regressors, heteroskedastic errors, and a large number of potentially weak instruments. The statistics are constructed to test hypotheses on the full parameter vector or on linear restrictions of the parameters. In the limit under the null, the statistics follow a combination of chi-square distributions, yet a simple change to the objective function converts the limit to a standard chi-square distribution. Monte Carlo experiments indicate that the tests maintain competitive size and power in finite samples relative to Anderson-Rubin procedures, and the method is illustrated on an application that uses genetic variants to study the effect of alcohol consumption on body mass index.

Core claim

The authors introduce jackknife instrumental-variable test statistics whose limiting distributions under the null, in the presence of many potentially weak instruments and heteroskedasticity, are combinations of chi-square random variables; by modifying the objective function these limits become ordinary chi-square distributions, delivering usable critical values for inference.

What carries the argument

Jackknife instrumental-variable test statistics obtained by deleting one observation at a time and adjusting the objective function to produce chi-square limits.

Load-bearing premise

The regularity conditions hold that deliver the stated limiting distributions for the jackknife statistics when the number of instruments grows with the sample size and heteroskedasticity is present.

What would settle it

A simulation experiment in which the empirical rejection rate of the proposed tests under the null deviates substantially from the nominal level when the number of instruments is large and heteroskedasticity follows a pattern allowed by the maintained assumptions would falsify the claimed size control.

Figures

Figures reproduced from arXiv: 2604.15437 by Federico Crudu, Giovanni Mellace, Zsolt S\'andor.

**Figure 1.** Figure 1: Power curves for DGP1 (n = 200, α = 0.05, r = 32). Results based on 1000 repetitions. The horizontal dotted red line denotes the 5% nominal rejection level, while the vertical dotted black line corresponds to β = 1. Panel (a) plots D∗ 1 , LM∗ , W∗ 1 based on the SJIVE objective function; panel (b) plots W∗ 1 , LM∗ , ARcf based on the JIVE1 objective function. The statistics are all asymptotically χ 2 1 dis… view at source ↗

**Figure 2.** Figure 2: Power curves for DGP1 (n = 200, α = 0.1, r = 32). Results based on 1000 repetitions. The horizontal dotted red line denotes the 5% nominal rejection level, while the vertical dotted black line corresponds to β = 1. Panel (a) plots D∗ 1 , LM∗ , W∗ 1 based on the SJIVE objective function; panel (b) plots W∗ 1 , LM∗ , ARcf based on the JIVE1 objective function. The statistics are all asymptotically χ 2 1 dist… view at source ↗

**Figure 3.** Figure 3: Power curves for DGP2 (n = 200, α = 0.05, r = 0.1). Results based on 1000 repetitions. The horizontal dotted red line denotes the 5% nominal rejection level, while the vertical dotted black line corresponds to β2 = 0.7. Panel (a) plots D∗ 1 , LM∗ , W∗ 1 based on the SJIVE objective function; panel (b) plots W∗ 1 , LM∗ , ARcf based on the JIVE1 objective function. The statistics are all asymptotically χ 2 1… view at source ↗

**Figure 4.** Figure 4: Power curves for DGP2 (n = 200, α = 0.1, r = 0.1). Results based on 1000 repetitions. The horizontal dotted red line denotes the 5% nominal rejection level, while the vertical dotted black line corresponds to β = 0.7. Panel (a) plots D∗ 1 , LM∗ , W∗ 1 based on the SJIVE objective function; panel (b) plots W∗ 1 , LM∗ , ARcf based on the JIVE1 objective function. The statistics are all asymptotically χ 2 1 d… view at source ↗

read the original abstract

This paper introduces a class of jackknife-based test statistics for linear regression models with endogeneity and heteroskedasticity in the presence of many potentially weak instrumental variables. The tests may be used when considering hypotheses on the full parameter vector or hypotheses defined as linear restrictions. We show that in the limit and under the null the proposed statistics are distributed as a combination of chi squares but by modifying the objective function we derive more familiar chi square limits. An extensive simulation study shows the competitive finite sample properties of the proposed tests in particular against Anderson-Rubin-type of statistics. Finally, we provide an empirical illustration that applies the proposed tests to study the effect of alcohol consumption on body mass index using genetic variants as instrumental variables using the UK Biobank.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Jackknife IV tests extend AR methods for many weak instruments plus heteroskedasticity via objective tweaks and show decent simulations, but the derivations need checking.

read the letter

The main takeaway is a new set of jackknife-based test statistics for linear IV models that accommodate heteroskedasticity and many potentially weak instruments. Under the null they converge to a combination of chi-squares, but the authors modify the objective function to recover standard chi-square limits, and their simulations claim competitive size and power relative to Anderson-Rubin tests. An application to alcohol consumption and BMI with UK Biobank genetic instruments rounds it out. What the paper does well is target a common practical headache in applied work where instrument counts are high and strength is uncertain. The jackknife construction is a natural way to reduce bias in that regime, and comparing directly to AR statistics makes the contribution easy to locate. The empirical illustration also shows the tests can be implemented on real data without obvious trouble. The soft spots sit in the parts that are hard to verify from the abstract alone. The objective-function modification that delivers the simpler chi-square limit is presented as straightforward, yet it is not obvious whether it preserves the test's power properties or introduces any finite-sample distortion under the exact conditions the paper assumes. The regularity conditions for the many-weak-instrument asymptotics are the usual ones in this literature, but they can fail to deliver reliable approximations when instruments are extremely weak or heteroskedasticity patterns are extreme; the simulations would need to demonstrate coverage across those edges. Without the full proofs and the precise simulation design, it is difficult to judge how robust the reported advantages really are. This is a paper for econometricians who routinely run IV regressions with high-dimensional or genetic instruments and want alternative test options. A reader already familiar with weak-instrument literature will get the most value from the simulation comparisons and the application. It deserves a serious referee because the idea is a targeted, incremental step that addresses a documented gap, even if the proofs and simulation details will require careful revision.

Referee Report

2 major / 2 minor

Summary. This paper introduces a class of jackknife-based test statistics for linear IV regression models with endogeneity and heteroskedasticity, suitable for hypotheses on the full parameter vector or linear restrictions. Under the null and in the limit (including many potentially weak instruments), the statistics converge in distribution to a combination of chi-squares; a modification to the objective function yields standard chi-square limits. An extensive simulation study demonstrates competitive finite-sample size and power properties relative to Anderson-Rubin-type tests, and the methods are illustrated empirically by estimating the effect of alcohol consumption on BMI using genetic variants as instruments in UK Biobank data.

Significance. If the asymptotic derivations hold, the paper makes a useful contribution to the literature on robust inference in IV settings with many weak instruments and heteroskedasticity by extending jackknife techniques to deliver tests with tractable limiting distributions and good finite-sample behavior. The simulation comparisons to Anderson-Rubin benchmarks and the genetic-IV application provide practical value for empirical researchers facing similar identification challenges. The approach's emphasis on handling heteroskedasticity alongside many instruments is a strength, as these features are prevalent in modern microeconometric applications.

major comments (2)

[Asymptotic theory] Asymptotic theory section: the claim that modifying the objective function produces standard chi-square limits under the null requires an explicit verification that this modification preserves the test's validity (i.e., does not alter the null distribution or introduce inconsistencies under local alternatives); the current description leaves unclear whether the modification is data-dependent or fixed in a way that affects the jackknife's robustness properties.
[Simulation study] Simulation study: the data-generating processes, instrument counts, and heteroskedasticity specifications used to demonstrate competitive performance against Anderson-Rubin statistics should be detailed with respect to the many-weak-instrument regime (e.g., number of instruments relative to sample size and concentration parameters); without these, it is difficult to confirm that the reported size control and power gains generalize beyond the specific designs examined.

minor comments (2)

[Abstract and Introduction] The abstract and introduction could more precisely define the jackknife statistics (e.g., the exact form of the leave-one-out estimator or weighting) to aid readers before the technical sections.
[Empirical illustration] In the empirical illustration, reporting the effective number of instruments, first-stage strength diagnostics, and any data exclusion rules would strengthen the connection between the theoretical results and the UK Biobank application.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, constructive comments, and recommendation for minor revision. We address each major comment below.

read point-by-point responses

Referee: [Asymptotic theory] Asymptotic theory section: the claim that modifying the objective function produces standard chi-square limits under the null requires an explicit verification that this modification preserves the test's validity (i.e., does not alter the null distribution or introduce inconsistencies under local alternatives); the current description leaves unclear whether the modification is data-dependent or fixed in a way that affects the jackknife's robustness properties.

Authors: We agree that additional explicit verification would strengthen the presentation. The modification is a fixed, non-data-dependent adjustment to the objective function. In the revised manuscript we will insert a new proposition (or remark) in the asymptotic theory section that formally verifies the modified statistic retains the same null limiting distribution (standard chi-square) and that the adjustment is of lower order under local alternatives, thereby preserving consistency and power properties. Because the adjustment does not depend on the data or on the heteroskedasticity structure, it leaves the jackknife's robustness properties unchanged. revision: yes
Referee: [Simulation study] Simulation study: the data-generating processes, instrument counts, and heteroskedasticity specifications used to demonstrate competitive performance against Anderson-Rubin statistics should be detailed with respect to the many-weak-instrument regime (e.g., number of instruments relative to sample size and concentration parameters); without these, it is difficult to confirm that the reported size control and power gains generalize beyond the specific designs examined.

Authors: We appreciate the referee's request for greater transparency on the many-weak-instrument aspects of the designs. In the revised version we will expand the simulation section with explicit statements of the instrument-to-sample-size ratios (k/n), the concentration-parameter values, and the precise heteroskedasticity specifications used in each Monte Carlo experiment. These additions will be presented in a new table or subsection so that readers can directly assess how the reported size and power results relate to the many-weak-instrument regime. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives limiting distributions for its jackknife IV statistics (combination of chi-squares under the null, or standard chi-square after objective-function modification) from standard many-weak-instrument asymptotics under heteroskedasticity. These steps rely on external regularity conditions and conventional IV limit theory rather than self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations. The simulation comparisons to Anderson-Rubin statistics supply independent finite-sample evidence, and the overall argument remains self-contained against external benchmarks without reducing any central claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard econometric assumptions for linear IV models with endogeneity and heteroskedasticity, plus conditions for many weak instruments to achieve the stated limiting distributions; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Standard linear IV assumptions including instrument exogeneity, relevance, and heteroskedasticity of errors
Required for the model setup and asymptotic behavior under the null.
domain assumption Regularity conditions for jackknife statistics to converge to chi-square limits with many weak instruments
Invoked to justify the limiting distribution after objective function modification.

pith-pipeline@v0.9.0 · 5414 in / 1443 out tokens · 43285 ms · 2026-05-10T09:25:07.307668+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 1 canonical work pages

[1]

Anderson, T. W. and Rubin, H. (1949). Estimators of the parameters of a single equation in a complete set of stochastic equations.The Annals of Mathematical Statistics, 21:570–582. Andrews, I. (2016). Conditional linear combination tests for weakly identified models. Econometrica, 84(6):2155–2182. Andrews, I., Stock, J. H., and Sun, L. (2019). Weak instru...

1949
[2]

and Dahlberg, M

Blomquist, S. and Dahlberg, M. (1999). Small sample properties of liml and jackknife iv estimators: Experiments with weak instruments.Journal of Applied Econometrics, 14(1):69–88. 99 Borusyak, K., Hull, P., and Jaravel, X. (2022). Quasi-experimental shift-share research designs.Review of Economic Studies, 89(1):181–213. Bound, J., Jaeger, D. A., and Baker...

work page arXiv 1999