arxiv: 2605.10842 · v1 · submitted 2026-05-11 · 💰 econ.EM · math.ST· stat.TH

Recognition: 2 theorem links

· Lean Theorem

Higher-Order Neyman Orthogonality in Moment-Condition Models

Koen Jochmans, Martin Weidner, St\'ephane Bonhomme, Whitney K. Newey

Pith reviewed 2026-05-12 04:13 UTC · model grok-4.3

classification 💰 econ.EM math.STstat.TH

keywords Neyman orthogonalitymoment conditionsnuisance parametershigher-order debiasingeconometric modelsbias reductionsemiparametric estimation

0 comments

The pith

Moment functions in parametric models can be made Neyman-orthogonal to any chosen order using only a fixed number of extra nuisance parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a general construction for moment conditions that stay insensitive to errors in estimating nuisance parameters up to any desired order. This reduces the bias that typically arises when nuisances are plugged in from a first-stage estimator. The construction adds only a constant number of new parameters regardless of how high the order is chosen, and that constant can be made as small as one. If the method works as described, it supplies a single recipe for higher-order debiasing that applies across many different econometric models without the usual growth in complexity. Readers may care because nuisance estimation errors appear in most semiparametric and high-dimensional settings, and controlling their effect at higher orders improves the reliability of the final estimates.

Core claim

We construct moment functions that are Neyman-orthogonal to a chosen order in parametric moment condition models. These moment functions reduce sensitivity to nuisance estimation error and, as such, offer a unified and tractable route to higher-order debiasing in a wide range of econometric models. The number of additional nuisance parameters required by our construction, beyond those already present in the original moment conditions, is independent of the order of orthogonalization and can be reduced to a single scalar if desired.

What carries the argument

Higher-order Neyman-orthogonal moment functions obtained by extending the original moment vector with a finite set of auxiliary conditions whose dimension does not depend on the target orthogonality order.

If this is right

Estimators that use the constructed moments will exhibit bias that vanishes faster with sample size even when the nuisance estimators converge at slower rates.
The same construction supplies higher-order debiasing for any model that can be written as a finite set of moment conditions.
The computational burden of achieving higher orders stays bounded because the number of extra parameters does not increase.
Users can select the order of orthogonality according to the bias reduction needed rather than according to how many new parameters the method would require.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may allow machine-learning estimators of nuisances to be paired with higher-order bias corrections without requiring the learner to achieve faster convergence rates.
In finite samples the method could be combined with cross-fitting or sample splitting to further reduce the impact of nuisance estimation.
The fixed-dimensional extension might be solved explicitly in common models such as those with fixed effects or selection terms, yielding closed-form adjustments.

Load-bearing premise

That a finite extension of the moment conditions always exists which achieves the target order of orthogonality while keeping the size of the extension independent of that order.

What would settle it

A concrete parametric moment condition model in which the minimal number of additional nuisance parameters required to reach k-th order orthogonality grows with k.

Figures

Figures reproduced from arXiv: 2605.10842 by Koen Jochmans, Martin Weidner, St\'ephane Bonhomme, Whitney K. Newey.

**Figure 1.** Figure 1: The five terms of ψ (2) in (24), indexed by rooted trees. The root (blue) carries m or its η-derivatives. Non-root non-leaf nodes carry Λ∂ p η g where p is the node’s number of children. Leaves (gray) carry Λg. with the order of differentiation equal to the number of children of the root. Each leaf carries Λg. Each non-root, non-leaf node carries Λ∂ p η g, where p is the number of children of the node. Eac… view at source ↗

**Figure 2.** Figure 2: The thirteen elements of T3. The first row shows the seven trees in which every non-root node has at most one child — these correspond to the terms of the affine moment function ψ (3) aff . The second row shows the six correction trees, each containing at least one non-root node with two or more children. + 3 2 ∂ 2 ηm(W1) h Λg(W2), Λg(W3) i − ∂ 2 ηm(W1) h Λg(W2), Λ∂ηg(W4) Λg(W3) i − 1 6 ∂ 3 ηm(W1) h Λg… view at source ↗

**Figure 3.** Figure 3: The integers |τ |, d(τ ), and |Aut(τ )| for the 13 trees with d(τ ) ≤ 3. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

**Figure 4.** Figure 4: Monte Carlo estimates of θ1 as a function of T. Solid lines show the mean across simulations for OLS and ORTH; the dashed line is the true value; shaded regions are 90% simulation bands. Starting with θ10 – the average of ηi10 – the right graph in [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

**Figure 5.** Figure 5: Monte Carlo estimates of θ2 as a function of T. Solid lines show the mean across simulations for OLS and ORTH; the dashed line is the true value; shaded regions are 90% simulation bands. The left graph in [PITH_FULL_IMAGE:figures/full_fig_p029_5.png] view at source ↗

**Figure 6.** Figure 6: The 12 trees in T4 that correspond to affine terms in ψ (4) . Example: κτ and |Aut(τ )| for τ corr 15 We illustrate the construction of the kernel κτ from Section 4.2.2 and the recursive computation of |Aut(τ )| from Appendix A.1 on the balanced tree τ corr 15 , in which the root has one child, that child has two children, and each of those grandchildren has two leaf children. Labeling the root by W1, th… view at source ↗

**Figure 7.** Figure 7: The 28 trees in T4 that correspond to non-linear correction terms in ψ (4) . result is κτ corr 15 = m′ η Λ∂ 2 η g(W2) Λ∂ 2 η g(W3) h Λg(W5), Λg(W6) i , Λ∂ 2 η g(W4) h Λg(W7), Λg(W8) i . For |Aut(τ corr 15 )|, apply the recursion at each node: at each middle internal, the two leaf children form a single isomorphism class with n = 2, contributing 2! · 1 2 = 2, so each middle subtree has |Aut| = 2. At the… view at source ↗

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper constructs moment functions that are Neyman-orthogonal to arbitrary order using a fixed number of extra nuisance parameters, possibly just one scalar.

read the letter

The main thing here is a construction that takes standard parametric moment conditions and augments them so the new moments have all derivatives up to order r with respect to the nuisances equal to zero, for any chosen r. The extra parameters needed for this do not grow with r and can be reduced to a single scalar. This is new relative to the usual first-order Neyman orthogonality results, which stop at the first derivative and do not address higher orders in a uniform way. The authors show how this reduces the impact of nuisance estimation error on the moments, giving a route to higher-order debiasing that stays tractable across a range of moment-based models. The setup is clean and they verify the orthogonality properties under the usual regularity conditions for these models. The single-scalar version is the part that needs the most scrutiny. Higher-order conditions generally produce a growing set of independent equations, so it is not immediate that one auxiliary parameter can satisfy them all simultaneously for arbitrary r. The paper supplies an explicit construction that apparently achieves this, but the details of how the auxiliary is chosen to make the conditions dependent or redundant matter. If that step holds without extra restrictions on the model, the result is solid. This is aimed at econometricians who work with moment estimators and estimated nuisances, such as in semiparametric or high-dimensional settings. It deserves a serious referee because the claim is specific, the authors are careful, and the potential payoff for finite-sample inference is real even if implementation details need checking.

Referee Report

1 major / 2 minor

Summary. The paper constructs moment functions in parametric moment condition models that achieve Neyman orthogonality of arbitrary chosen order. The key feature is that the construction requires only a fixed number of additional nuisance parameters (independent of the target order) that can be reduced to a single scalar, thereby providing a unified and tractable approach to higher-order debiasing across a range of econometric models.

Significance. If the construction is valid and applies generally without hidden restrictions, the result would offer a valuable unification of debiasing techniques in the literature on orthogonal moments and semiparametric estimation. It could simplify higher-order bias corrections in models where nuisance estimation error is a concern, potentially improving finite-sample performance in a broad class of moment-based estimators.

major comments (1)

[Abstract and main construction] The central claim that a single scalar auxiliary parameter suffices for arbitrary order r (Abstract) requires explicit verification that the system of r-th order Gateaux derivative conditions collapses or is satisfied identically. The skeptic's observation that these conditions are typically independent for different r raises a load-bearing concern for the independence-of-order result; the manuscript must derive the auxiliary-parameter equations (likely in the main construction section) and show how one scalar solves them for any fixed r without additional parameters.

minor comments (2)

Clarify the precise definition of the augmented moment function and the role of the original versus additional nuisance parameters to avoid ambiguity in the statement of the orthogonality property.
Provide a simple illustrative example (e.g., a low-dimensional parametric model) early in the paper to demonstrate the construction for r=2 and r=3 with the scalar parameter.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful report and for identifying the need for greater explicitness in verifying the single-scalar auxiliary parameter. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and main construction] The central claim that a single scalar auxiliary parameter suffices for arbitrary order r (Abstract) requires explicit verification that the system of r-th order Gateaux derivative conditions collapses or is satisfied identically. The skeptic's observation that these conditions are typically independent for different r raises a load-bearing concern for the independence-of-order result; the manuscript must derive the auxiliary-parameter equations (likely in the main construction section) and show how one scalar solves them for any fixed r without additional parameters.

Authors: We agree that the current presentation would benefit from an explicit derivation of the auxiliary-parameter equations and a direct demonstration that a single scalar satisfies the full system for arbitrary r. In the revised manuscript we will insert a dedicated subsection in the main construction that (i) writes out the r-th order Gateaux derivative conditions in terms of the auxiliary parameter, (ii) shows that these conditions reduce to a single scalar equation because of the parametric structure of the original moment conditions, and (iii) verifies that the same scalar solves the system identically for any fixed r. This addition will make the independence-of-order claim fully transparent without altering the substance of the construction. revision: yes

Circularity Check

0 steps flagged

No circularity: explicit construction of higher-order orthogonal moments

full rationale

The paper derives a specific family of augmented moment functions that satisfy the higher-order Neyman orthogonality conditions by direct construction in the parametric moment-condition setting. This construction is presented as a new object whose properties (including order-independent auxiliary dimension) follow from the explicit functional form chosen, without reducing to a fitted parameter, self-referential definition, or load-bearing self-citation chain. No step equates the target result to its own inputs by construction, and the derivation remains self-contained against the stated moment conditions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The work implicitly relies on the standard setup of parametric moment condition models.

axioms (1)

domain assumption Existence of sufficiently smooth moment functions and nuisance estimators in parametric models
Required for Neyman orthogonality of any order to be definable; not stated explicitly but presupposed by the claim.

pith-pipeline@v0.9.0 · 5379 in / 1324 out tokens · 44429 ms · 2026-05-12T04:13:29.280714+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration echoes
We construct moment functions that are Neyman-orthogonal to a chosen order... The number of additional nuisance parameters... can be reduced to a single scalar if desired.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Angrist, J. D. and B. Frandsen (2022). Machine labor. Journal of Labor Economics\/ 40 , S97--S140

work page 2022
[2]

Chernozhukov, and C

Belloni, A., V. Chernozhukov, and C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. Review of Economic Studies\/ 81 , 608--650

work page 2014
[3]

Bickel, P. J., C. A. J. Klaassen, Y. Ritov, and J. A. Wellner (1993). Efficient and Adaptive Estimation for Semiparametric Models . Baltimore: Johns Hopkins University Press

work page 1993
[4]

Jochmans, and M

Bonhomme, S., K. Jochmans, and M. Weidner (2025). A N eyman-orthogonalization approach to the incidental parameter problem. Mimeo\/

work page 2025
[5]

Butcher, J. C. (1963). Coefficients for the study of Runge--Kutta integration processes. Journal of the Australian Mathematical Society\/ 3 , 185--201

work page 1963
[6]

Cattaneo, M. D., M. Jansson, and X. Ma (2019). Two-step estimation and inference with possibly many included covariates. The Review of Economic Studies\/ 86 , 1095--1122

work page 2019
[7]

Cayley, A. (1857). On the theory of the analytical forms called trees. Philosophical Magazine\/ 13\/ (85), 172--176

work page
[8]

Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica\/ 60 , 567--596

work page 1992
[9]

Chetverikov, M

Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, and J. Robins (2018). Double/debiased machine learning for treatment and structural parameters. Econometrics Journal\/ 21 , C1--C68

work page 2018
[10]

Chetverikov, D., J. R.-V. Sorensen, and A. Tsyvinski (2026). Triple/double-debiased L asso. arXiv preprint arXiv:2603.20134\/

work page arXiv 2026
[11]

Hahn, J. and W. K. Newey (2004). Jackknife and analytical bias reduction for nonlinear panel models. Econometrica\/ 72 , 1295--1319

work page 2004
[12]

Lubich, and G

Hairer, E., C. Lubich, and G. Wanner (2006). Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations\/ (2nd ed.), Volume 31 of Springer Series in Computational Mathematics . Berlin: Springer-Verlag

work page 2006
[13]

Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica\/ 50 , 1029--1054

work page 1982
[14]

Javanmard, A. and A. Montanari (2014). Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research\/ 15 , 2869--2909

work page 2014
[15]

Jochmans, K. and M. Weidner (2019). Fixed-effect regressions on network data. Econometrica\/ 87 , 1543--1560

work page 2019
[16]

Kline, P., E. K. Rose, and C. R. Walters (2022). Systemic discrimination among large U.S. employers. Quarterly Journal of Economics\/ 137 , 1963--2036

work page 2022
[17]

Saggio, and M

Kline, P., R. Saggio, and M. S lvsten (2020). Leave-out estimation of variance components. Econometrica\/ 88 , 1859--1898

work page 2020
[18]

Lindsay, and R

Li, H., B. Lindsay, and R. Waterman (2003). Efficiency of projected score methods in rectangular array asymptotics. Journal of the Royal Statistical Society, Series B\/ 65 , 191--208

work page 2003
[19]

Syrgkanis, and I

Mackey, L., V. Syrgkanis, and I. Zadik (2018). Orthogonal machine learning: Power and limitations. In International Conference on Machine Learning , pp.\ 3375--3383. PMLR

work page 2018
[20]

McLachlan, R. I., K. Modin, H. Munthe-Kaas, and O. Verdier (2017). Butcher series: A story of rooted trees and numerical methods for evolution equations. Asia Pacific Mathematics Newsletter\/ . arXiv:1512.00906

work page arXiv 2017
[21]

Mikusheva, A. and M. S lvsten (2025). Linear regression with weak exogeneity. Working paper\/

work page 2025
[22]

Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica\/ 62 , 1349--1382

work page 1994
[23]

Neyman, J. (1959). Optimal asymptotic tests of composite hypotheses. In U. Grenander (Ed.), Probability and Statistics \/ , 416--444

work page 1959
[24]

Neyman, J. and E. L. Scott (1948). Consistent estimates based on partially consistent observations. Econometrica\/ 16 , 1--32

work page 1948
[25]

Robins, J. M., L. Li, R. Mukherjee, E. T. Tchetgen, and A. van der Vaart (2017). Minimax estimation of a functional on a structured high-dimensional model. Annals of Statistics\/ 45 , 1951--1987

work page 2017
[26]

Robins, J. M., L. Li, E. T. Tchetgen, and A. van der Vaart (2008). Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and Statistics: Essays in Honor of David A. Freedman , Volume 2, pp.\ 335--421. IMS Collections

work page 2008
[27]

Sur, P. and E. J. Cand\`es (2019). A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences\/ 116 , 14516--14525

work page 2019
[28]

Valiente, G. (2002). Algorithms on Trees and Graphs . Berlin: Springer-Verlag

work page 2002
[29]

B \"u hlmann, Y

van de Geer, S., P. B \"u hlmann, Y. Ritov, and R. Dezeure (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics\/ 42 , 1166--1202

work page 2014
[30]

van der Vaart, A. (2014). Higher order tangent spaces and influence functions. Statistical Science\/ 29 , 679--686

work page 2014
[31]

W \"u thrich, K. and Y. Zhu (2023). Omitted variable bias of L asso-based inference methods: A finite sample analysis. Review of Economics and Statistics\/ 105 , 982--997

work page 2023
[32]

Zhang, C.-H. and S. S. Zhang (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B\/ 76 , 217--242

work page 2014