Order-Explicit Linearization of High-Dimensional $U$-Statistics

David M. Ritzwoller; Vasilis Syrgkanis

arxiv: 2405.07860 · v4 · pith:5GBWKPA4new · submitted 2024-05-13 · 💰 econ.EM · math.ST· stat.ML· stat.TH

Order-Explicit Linearization of High-Dimensional U-Statistics

David M. Ritzwoller , Vasilis Syrgkanis This is my paper

Pith reviewed 2026-05-24 00:57 UTC · model grok-4.3

classification 💰 econ.EM math.STstat.MLstat.TH

keywords U-statisticsHájek projectionlarge deviation boundshigh-dimensional statisticsconcentration inequalitiesGaussian approximationrandom forestsnonparametric regression

0 comments

The pith

Any U-statistic of order b on n samples deviates from its Hájek projection by at most order O_p(φ b n^{-1} log²(dn)).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives an explicit bound on the difference between a high-dimensional U-statistic and its Hájek projection. For kernels of order b whose coordinates have ψ1-Orlicz norm bounded by φ, the supremum deviation scales as φ b over n times the square of the log of d n. The authors prove this rate is optimal up to the log power and use it to obtain new concentration and approximation theorems. These theorems in turn justify resampling-based simultaneous confidence intervals for a family of subsampled-kernel nonparametric regression estimators that includes several random forest procedures.

Core claim

Any U-statistic of order b on n observations, with a d-dimensional kernel whose coordinates have ψ1-Orlicz norm at most φ, has a maximum deviation from its Hájek projection of order O_p(φ b n^{-1} log²(dn)). The proof relies on new order-explicit moment inequalities for higher-order Hoeffding components, and the rate cannot be improved beyond a polynomial factor in the logarithm.

What carries the argument

Order-explicit moment inequalities for higher-order Hoeffding components that bound the tail of the full U-statistic.

If this is right

Bernstein-type concentration inequalities hold for high-dimensional U-statistics of arbitrary order.
Gaussian approximation results follow for the same class of statistics.
Resampling-based simultaneous confidence intervals are consistent for nonparametric regression estimators built from subsampled kernels.
The consistency result covers several forms of random forest regression, including Generalized Random Forests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linearization technique may apply directly to other degenerate statistics that admit a Hoeffding decomposition.
Finite-sample simulations with concrete kernels could reveal whether the log-squared factor is necessary in practice.
The bound supplies a quantitative route to uniform inference over many kernel-based estimators without requiring separate case-by-case analysis.

Load-bearing premise

The new moment inequalities for higher-order Hoeffding components hold and are strong enough to control the deviation probability of the entire U-statistic.

What would settle it

An explicit U-statistic of order b whose coordinate-wise deviation from the Hájek projection exceeds C φ b n^{-1} log²(dn) with probability bounded away from zero, for arbitrarily large C.

read the original abstract

We give an order-explicit large deviation bound for the difference between a high-dimensional $U$-statistic and its H\'{a}jek projection. In particular, we show that any $U$-statistic of order $b$ on $n$ observations, with a $d$-dimensional kernel whose coordinates have $\psi_1$-Orlicz norm at most $\phi$, has a maximum deviation from its H\'{a}jek projection of order $O_p(\phi b n^{-1}\log^2(dn))$. The proof relies on the development of novel order-explicit moment inequalities for higher-order Hoeffding components. We show that this rate is unimprovable, up to the polynomial factor on the logarithmic term. As corollaries, we obtain new Bernstein-type concentration and Gaussian approximation results for high-dimensional $U$-statistics. We apply these results to establish the consistency of a set of resampling-based simultaneous confidence intervals built around a class of nonparametric regression estimators constructed with subsampled kernels. This class encompasses several forms of random forest regression, including Generalized Random Forests.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a new order-explicit bound on the linearization error of high-dimensional U-statistics that should be usable for inference around random forest estimators.

read the letter

The main result is a bound showing that any U-statistic of order b with a d-dimensional kernel whose coordinates have ψ1-Orlicz norm at most φ deviates from its Hájek projection by at most order φ b n^{-1} log²(dn) in the max norm. The authors derive this via new moment inequalities on the higher-order Hoeffding components and argue the rate is sharp up to the log factor. They then obtain Bernstein-type concentration and Gaussian approximation corollaries, and apply the bound to show consistency of resampling-based simultaneous intervals for a class of subsampled kernel estimators that includes generalized random forests. This explicit dependence on b and the concrete rate for the high-dimensional case look new compared with earlier U-statistic linearization results. The application section is a natural use case and connects the bound to an active area in econometric machine learning. The central technical step is the moment inequalities themselves; if those hold with the stated constants, the rest follows cleanly. The paper does not appear to hide circularity or rely on fitted quantities. One minor soft spot is that checking the Orlicz-norm condition on the kernels arising in random forests may require additional work in practice, though the authors presumably address this. The overall argument is internally consistent on its own terms. This is a technical tool paper aimed at researchers who need sharp, order-aware bounds for high-dimensional U-statistics in nonparametric and ML settings. It is worth a serious referee because the bound is specific, the rate is usable, and the application is relevant to current work on tree-based estimators.

Referee Report

0 major / 3 minor

Summary. The manuscript derives an order-explicit large-deviation bound showing that any U-statistic of order b on n observations, with a d-dimensional kernel whose coordinates have ψ₁-Orlicz norm at most φ, deviates from its Hájek projection by at most O_p(φ b n^{-1} log²(dn)). The proof rests on newly developed moment inequalities for higher-order Hoeffding components; the rate is asserted to be sharp up to the polynomial log factor. Corollaries include Bernstein-type concentration and Gaussian approximation bounds for high-dimensional U-statistics, which are then used to establish consistency of resampling-based simultaneous confidence intervals for a class of subsampled-kernel nonparametric regression estimators that includes several random-forest variants.

Significance. If the order-explicit moment inequalities hold, the result supplies a technically useful linearization tool whose explicit dependence on the kernel order b is valuable for econometric and machine-learning settings that routinely employ U-statistics of order greater than two. The sharpness claim and the concrete application to random-forest inference add to the contribution; the work is therefore of interest to both theoretical econometricians and practitioners working with high-dimensional nonparametric estimators.

minor comments (3)

§2, notation for the Hoeffding decomposition: the indexing of the projection terms P_k and the associated kernels h_k is introduced without an explicit recursive definition; adding a short display equation that relates h_k to the original kernel would improve readability for readers outside the U-statistic literature.
Theorem 3.1 (the main bound): the statement lists the O_p rate but does not record the implicit constant’s dependence on b; a parenthetical remark clarifying that the constant is at most polynomial in b would make the order-explicit nature of the result fully transparent.
§5, application to random forests: the mapping from the general U-statistic bound to the particular subsampled-kernel estimator is sketched rather than derived step-by-step; inserting one intermediate display that identifies the effective order b and the Orlicz norm φ for the forest kernel would strengthen the link between theory and application.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its significance for econometric and machine-learning applications, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via novel inequalities

full rationale

The paper develops novel order-explicit moment inequalities for higher-order Hoeffding components and uses them to derive the stated large-deviation bound on the U-statistic deviation from its Hájek projection. This is a standard forward derivation from newly established results rather than any reduction by construction, self-definition, fitted-input renaming, or load-bearing self-citation chain. The provided abstract and reader's summary confirm the inequalities are asserted as independent content sufficient for the O_p bound, with no equations or steps shown to collapse back to the target quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of newly developed order-explicit moment inequalities for Hoeffding components, which are not detailed in the abstract; the result also invokes the standard Hoeffding decomposition of U-statistics.

axioms (1)

standard math U-statistics admit an orthogonal Hoeffding decomposition into components of increasing order
This is a classical result in U-statistic theory invoked as the basis for controlling the deviation from the Hájek projection.

pith-pipeline@v0.9.0 · 5729 in / 1281 out tokens · 38213 ms · 2026-05-24T00:57:04.813591+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

novel order-explicit moment inequalities for higher-order Hoeffding components... O_p(φ b n^{-1} log²(dn))
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 4.2... Hájek residual... Sherman (1994) symmetrization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Adaptive discovery of effect modification in matched observational studies
stat.ME 2026-05 unverdicted novelty 5.0

A finite-sample valid method discovers and selects covariate-interpretable subgroups with effect modification in matched observational studies, exactly controlling subgroup-level FDR and incorporating sensitivity anal...

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

and Szepesv´ari, C

Abou-Moustafa, K. and Szepesv´ari, C. (2019). An exponential efron-stein inequality for l q stable learning rules. In Algorithmic Learning Theory, pages 31–63. PMLR. Adler, R. J. and Taylor, J. E. (2009). Random fields and geometry. Springer Science & Business Media. Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restri...

work page 2019
[2]

and Pouzo, D

Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals. Econometrica, 80(1):277–321. Chernozhukov, V ., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters: Double/debiased ...

work page 2012
[3]

Chernozhukov, V ., Chetverikov, D., and Kato, K. (2017b). Detailed proof of nazarov’s inequality.arXiv preprint arXiv:1711.10696. Chernozhukov, V ., Chetverikov, D., Kato, K., and Koike, Y . (2022a). Improved central limit theorem and bootstrap approximations in high dimensions. The Annals of Statistics, 50(5):2562–2586. Chernozhukov, V ., Escanciano, J. ...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

and Romano, J

DiCiccio, C. and Romano, J. (2022). Clt for u-statistics with growing dimension. Statistica Sinica, 32(1). Efron, B. and Stein, C. (1981). The jackknife estimate of variance. The Annals of Statistics, pages 586–596. Fix, E. and Hodges, J. L. (1989). Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Re...

work page 2022
[5]

G¨otze, F., Sambale, H., and Sinulis, A. (2021). Concentration inequalities for polynomials in α-sub- exponential random variables. Electronic Journal of Probability, 26(none):1 –

work page 2021
[6]

Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pages 315–331. H´ajek, J. (1968). Asymptotic normality of simple linear rank statistics under alternatives. The Annals of Mathematical Statistics, pages 325–346. Hanson, D. L. and Wright, F. T. (1971). A bound on tail pro...

work page arXiv 1998
[7]

and McKenzie, D

Kraay, A. and McKenzie, D. (2014). Do poverty traps exist? assessing the evidence. Journal of Economic Perspectives, 28(3):127–148. Kumar, R., Lokshtanov, D., Vassilvitskii, S., and Vattani, A. (2013). Near-optimal bounds for cross-validation via loss stability. In International Conference on Machine Learning, pages 27–35. PMLR. Lewis, G. and Syrgkanis, V...

work page 2014
[8]

Peng, W., Coleman, T., and Mentch, L. (2022). Rates of convergence for random forests via generalized u-statistics. Electronic Journal of Statistics, 16(1):232–292. Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, pages 2031–2050. Politis, D. N., Romano, J. P....

work page arXiv 2022
[9]

and Chernozhukov, V

Semenova, V . and Chernozhukov, V . (2021). Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24(2):264–289. Sexton, J. and Laake, P. (2009). Standard errors for bagged and random forest estimators. Computational Statistics & Data Analysis, 53(3):801–811. Sherman, R. P. (1994). Maximal...

work page 2021
[10]

Wager, S

Cambridge university press. Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242. Wager, S., Hastie, T., and Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal o...

work page 2018
[11]

Suppose that the Neyman orthogonal moment function M(x; θ0, g0) satisfies Assumption A.1 and Part (i) of Assumption 3.3 and that the centered empirical moment function M n(x; θ0, g0) satisfies Assumption A.2. If Assumption A.3 and Assumption A.4 hold, then the confidence region defined in Definition 2.1 satisfies sup P ∈P P n θ0(x(d)) ∈ ˆC(x(d)) o − (1 − ...

work page 2018
[12]

nX i=1 Zi q #1/q ≲ E

Lemma C.2 implies that E   1 Nb X s∈Sn,b Vsh(x; Ds) q | Dn   ≤ qbq/2   1 Nb X s∈Sn,b n b −1 (h(x; Ds))2   q/2 . (C.10) Consequently, Lemma C.1 implies that E   1 Nb X s∈Sn,b h(x; Ds) q   ≤ 2bqE  E   1 Nb X s∈Sn,b Vsh(x; Ds) q | Dn     ≤ 2bqqbq/2E     1 Nb X s∈Sn,b n b −1 (h(x; Ds))2   q/2  . (C.11) To simplify the expression ...

work page 1948
[13]

implies that P n ∥Λ−1/2Z∥∞ ≥ C p log dn o ≤ n−1 . Thus, Theorem D.1 implies that P n ∥Λ−1/2Rn(x(dn))∥∞ ≥ C p log dn o ≲ φ2 log5(dn) λ2n 1/4 + δn p log d + ρn (D.26) 13 and that P n ∥Λ−1/2R∗ n(x(d))∥∞ ≥ C p log dn | Dn o ≲ φ2 log5(dn) λ2n 1/4 + δn p log d (D.27) with probability greater than 1 − C(n−1/2φλ−1 log3/2(dn) + ρn). In turn, to bound the discrepan...

work page 2018
[14]

In the main text, we assume that the nuisance parameter estimator ˆgn is computed on a separate sample, independent of the dataDn

■ F.3 Stochastic Equicontinuity without Sample Splitting. In the main text, we assume that the nuisance parameter estimator ˆgn is computed on a separate sample, independent of the dataDn. In this section, we give additional conditions under which the nuisance parameter estimator ˆgn can be computed on the same data used to construct the estimator ˆθn(x(d...

work page 2022
[15]

double-centering trick,

First, we require that moment m(Di, g) satisfies a pair of smoothness conditions. Assumption F.1. Define the higher-order variogram V (q)(x; g, g′) = E h m(Di, g) − m(Di, g′) 2q | Xi = x i for each g and g′ in G. For each integer q > 0, the Lipschitz condition |V (q)(x; g, g′) − V (q)(x′; g, g′)| ≲ ∥x − x′∥q ∞ (F.9) holds for each x and x′ in X and the me...

work page 2019
[16]

(F.38) For reasons of space, we give the details of the proof of the upper bound encoded in (4.8)

Define the normalized quantity ˆu(1)(x(d); Di) = Σ−1/2˜u(1)(x(d); Di) . (F.38) For reasons of space, we give the details of the proof of the upper bound encoded in (4.8). The lower bound will follow from an analogous argument, and we note the differences where they occur. Observe that the decomposition (F.35) implies the upper bound P r n b2 Σ−1/2U n,b(x(...

work page 2015
[17]

G.1 Data

Appendix G.3 discusses our simulation calibration. G.1 Data. The data from Banerjee et al. (2015) were acquired from https://dataverse.harvard. edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NHIXNT on September 10,

work page doi:10.7910/dvn/nhixnt 2015
[18]

Here, we consider the data from the graduation program implemented in Ghana, as it has a larger sample size

The data from the graduation program implemented in Pakistan are considered in Chen and Ritzwoller (2023) and Ritzwoller and Romano (2023). Here, we consider the data from the graduation program implemented in Ghana, as it has a larger sample size. The data record measurements of many pre-treatment and post-treatment outcomes for 2,606 individuals in the ...

work page 2023
[19]

In Figure 1 and Figure 2, we set the subsample proportion b/n equal to 0.05

G.2 Parameter Choices. In Figure 1 and Figure 2, we set the subsample proportion b/n equal to 0.05. In constructing the nuisance parameter estimate, we use set b/n = 0.025. We use r = 200 bootstrap replicates throughout. We use 20,000 trees to construct Figure 1 and 2,000 trees in each bootstrap replicate to construct Figure 2 and throughout the simulatio...

work page 2015
[20]

As the data Dh are drawn independently and identically with distribution P in P and we have assumed that δn/2 ≲ δn and ρn/2 ≲ ρn, by setting t = Cδn, Assumption A.3 and the bound (H.23) imply that the event En(t) occurs with probability greater than 1 − Cρn. Thus, Lemma D.1 implies that P n√nR∗ n(x(d)) ∈ R | Dn o − P {Z ∈ R} (H.46) 37 ≤ P ( 2√n nX i∈h ˆu(...

work page 2023
[21]

To handle the term (H.51), we apply the following quantitative central limit theorem, stated as Lemma 4.6 in Chernozhukov et al

Lemma D.1 implies that (H.52) is less than t p log(d). To handle the term (H.51), we apply the following quantitative central limit theorem, stated as Lemma 4.6 in Chernozhukov et al. (2022a). Lemma H.2 (Lemma 4.6, Chernozhukov et al., 2022a) . Consider the setting and assumptions of Lemma H.1. Let Xn = ( X1, . . . , Xn) collect the observed data and let ...

work page 2019
[22]

See Chapter 1 of De la Pena and Gin´e (1999) for a textbook treatment

The second inequality is a L´evy type inequality for independent random vectors, due to Montgomery-Smith (1993). See Chapter 1 of De la Pena and Gin´e (1999) for a textbook treatment. Lemma H.4 (Theorem 1.1.5, De la Pena and Gin´e (1999)). Let Z1, . . . , Zn be independent random vectors in Rd. There exists a universal constants C1 and C2 such that P ( ma...

work page 1993
[23]

(H.62) Now, the choice δ = C r log n n (H.63) gives P Qn ≥ δn 2 ≲ 1 n (H.64) by (H.58)

In particular, as ∥ 2√n ˆu(x(j); Di)∥ψ1 ≤ 2√n φ λ , and max j∈[d] δn/2X k=1 E 2√n ˆu(x(j); Di) = 2δ , (H.61) Lemma H.3 and Lemma H.4 imply that P ( max 1≤k≤δ n 2 2√n kX i=1 ˆu x(d); Di ∞ ≥ C δ log1/2(dn) + 2√n φ λ log2(dn) ) ≲ 1 n . (H.62) Now, the choice δ = C r log n n (H.63) gives P Qn ≥ δn 2 ≲ 1 n (H.64) by (H.58). Plugging this choice into (H.62) yie...

work page 2018
[24]

Observe that 2 n X i∈s ˆu(x(d), Di) − 1 n nX i=1 ˆu(x(d), Di) = 1 n nX i=1 ˜Viˆu(x(d), Di)

Let ˜Vi be a random variable taking the value 1 with i is an element of the subset s and taking the value −1 otherwise. Observe that 2 n X i∈s ˆu(x(d), Di) − 1 n nX i=1 ˆu(x(d), Di) = 1 n nX i=1 ˜Viˆu(x(d), Di) . (J.7) and that the weights ˜Vi are independent and identically distributed Rademacher random variables. Thus, on the event E ′n(t, t0), we have ...

work page 2022
[25]

It will suffice to show that 1 n! X π∈Pn n b −1 ⌊n/b⌋X l=1 F1(x; sπ,l) 2q ≲ 2qn−1bγ(2q)b1/2q. (J.15) 55 In particular, if (J.15) holds, then we have that P    1 n! X π∈Pn n b −1 ⌊n/b⌋X l=1 F1(x; sπ,l) > 2√e qn−1bγ(2q)b1/2q    ≤ 1 exp(q) E " | 1 n! P π∈Pn n b −1P⌊n/b⌋ l=1 F1(x; sπ,l)|2q # 2qn−1bγ(2q)b1/2q 2q ≲ exp(−q), (J.16) by Markov’s inequality. ...

work page 1996
[26]

, Xb be independent centered random variables

Let X1, . . . , Xb be independent centered random variables. For any integer q ≥ 2, we have that ∥ 1 n X Xi∥1 ≲ r Var(Xi)q b + q b ∥ max i∈[b] Xi∥q. In particular, Lemma J.3 implies that ∥ b′ n bX i=1 u(1)(Di) − u(1)(D′ i) ∥q = b′b n ∥1 b bX i=1 u(1)(Di) − u(1)(D′ i) ∥q ≲ b′b n r σ2 b′q b + q ϕ b ! ≲ √ b′b n qϕ, (J.47) where the second inequality follows ...

work page 2015
[27]

FIGURE J.2. CATE Estimates, 2-Fold Cross-Fitting 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption 0.1 0.2 0.3 CATE Notes: Figure 1 displays a heat map giving CATE estimates for the intervention studied in Banerjee et al. (2015) on post-treatment assets, using the the 2-fold cross-split estimator (F.24). Other features of the disp...

work page 2015
[28]

65 FIGURE J.3. Performance, 2-Fold Cross-Fitting Mean Length Max Bias Mean Bias Coverage Lower Coverage Upper Coverage 1.0 2.5 5.0 7.5 1.0 2.5 5.0 7.5 1.0 2.5 5.0 7.5 0.88 0.90 0.92 0.94 0.0020 0.0025 0.0030 0.0035 0.0040 0.90 0.91 0.92 0.93 0.94 0.030 0.035 0.040 0.045 0.050 0.055 0.800 0.825 0.850 0.875 0.900 0.20 0.25 0.30 0.35 Sample Multiplier (h) b ...

work page 2015
[29]

FIGURE J.4. Half-Sample Confidence Region Lower Bound, 2-Fold Cross-Fitting 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption −0.2 −0.1 0.0 0.1 CATE Notes: Figure 2 displays heat maps giving half-sample lower confidence bound for the CATE of the intervention studied in Banerjee et al. (2015) on post-treatment total assets. Other f...

work page 2015
[30]

66 FIGURE J.5. Calibrated CATEs 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption CATE 0.05 0.10 0.15 0.20 0.25 Notes: Figure J.5 displays the “true” value of CATE used in our calibrated simulation. 67 FIGURE J.6. Validation X Y Mean 1e−02 1e+00 1e+02 1e+04 1e−02 1e+00 1e+02 1e+04 1e−02 1e+00 1e+02 1e+04Generated Data X Y Variance...

work page 2015

[1] [1]

and Szepesv´ari, C

Abou-Moustafa, K. and Szepesv´ari, C. (2019). An exponential efron-stein inequality for l q stable learning rules. In Algorithmic Learning Theory, pages 31–63. PMLR. Adler, R. J. and Taylor, J. E. (2009). Random fields and geometry. Springer Science & Business Media. Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restri...

work page 2019

[2] [2]

and Pouzo, D

Chen, X. and Pouzo, D. (2012). Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals. Econometrica, 80(1):277–321. Chernozhukov, V ., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters: Double/debiased ...

work page 2012

[3] [3]

Chernozhukov, V ., Chetverikov, D., and Kato, K. (2017b). Detailed proof of nazarov’s inequality.arXiv preprint arXiv:1711.10696. Chernozhukov, V ., Chetverikov, D., Kato, K., and Koike, Y . (2022a). Improved central limit theorem and bootstrap approximations in high dimensions. The Annals of Statistics, 50(5):2562–2586. Chernozhukov, V ., Escanciano, J. ...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

and Romano, J

DiCiccio, C. and Romano, J. (2022). Clt for u-statistics with growing dimension. Statistica Sinica, 32(1). Efron, B. and Stein, C. (1981). The jackknife estimate of variance. The Annals of Statistics, pages 586–596. Fix, E. and Hodges, J. L. (1989). Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Re...

work page 2022

[5] [5]

G¨otze, F., Sambale, H., and Sinulis, A. (2021). Concentration inequalities for polynomials in α-sub- exponential random variables. Electronic Journal of Probability, 26(none):1 –

work page 2021

[6] [6]

Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pages 315–331. H´ajek, J. (1968). Asymptotic normality of simple linear rank statistics under alternatives. The Annals of Mathematical Statistics, pages 325–346. Hanson, D. L. and Wright, F. T. (1971). A bound on tail pro...

work page arXiv 1998

[7] [7]

and McKenzie, D

Kraay, A. and McKenzie, D. (2014). Do poverty traps exist? assessing the evidence. Journal of Economic Perspectives, 28(3):127–148. Kumar, R., Lokshtanov, D., Vassilvitskii, S., and Vattani, A. (2013). Near-optimal bounds for cross-validation via loss stability. In International Conference on Machine Learning, pages 27–35. PMLR. Lewis, G. and Syrgkanis, V...

work page 2014

[8] [8]

Peng, W., Coleman, T., and Mentch, L. (2022). Rates of convergence for random forests via generalized u-statistics. Electronic Journal of Statistics, 16(1):232–292. Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, pages 2031–2050. Politis, D. N., Romano, J. P....

work page arXiv 2022

[9] [9]

and Chernozhukov, V

Semenova, V . and Chernozhukov, V . (2021). Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24(2):264–289. Sexton, J. and Laake, P. (2009). Standard errors for bagged and random forest estimators. Computational Statistics & Data Analysis, 53(3):801–811. Sherman, R. P. (1994). Maximal...

work page 2021

[10] [10]

Wager, S

Cambridge university press. Wager, S. and Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242. Wager, S., Hastie, T., and Efron, B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal o...

work page 2018

[11] [11]

Suppose that the Neyman orthogonal moment function M(x; θ0, g0) satisfies Assumption A.1 and Part (i) of Assumption 3.3 and that the centered empirical moment function M n(x; θ0, g0) satisfies Assumption A.2. If Assumption A.3 and Assumption A.4 hold, then the confidence region defined in Definition 2.1 satisfies sup P ∈P P n θ0(x(d)) ∈ ˆC(x(d)) o − (1 − ...

work page 2018

[12] [12]

nX i=1 Zi q #1/q ≲ E

Lemma C.2 implies that E   1 Nb X s∈Sn,b Vsh(x; Ds) q | Dn   ≤ qbq/2   1 Nb X s∈Sn,b n b −1 (h(x; Ds))2   q/2 . (C.10) Consequently, Lemma C.1 implies that E   1 Nb X s∈Sn,b h(x; Ds) q   ≤ 2bqE  E   1 Nb X s∈Sn,b Vsh(x; Ds) q | Dn     ≤ 2bqqbq/2E     1 Nb X s∈Sn,b n b −1 (h(x; Ds))2   q/2  . (C.11) To simplify the expression ...

work page 1948

[13] [13]

implies that P n ∥Λ−1/2Z∥∞ ≥ C p log dn o ≤ n−1 . Thus, Theorem D.1 implies that P n ∥Λ−1/2Rn(x(dn))∥∞ ≥ C p log dn o ≲ φ2 log5(dn) λ2n 1/4 + δn p log d + ρn (D.26) 13 and that P n ∥Λ−1/2R∗ n(x(d))∥∞ ≥ C p log dn | Dn o ≲ φ2 log5(dn) λ2n 1/4 + δn p log d (D.27) with probability greater than 1 − C(n−1/2φλ−1 log3/2(dn) + ρn). In turn, to bound the discrepan...

work page 2018

[14] [14]

In the main text, we assume that the nuisance parameter estimator ˆgn is computed on a separate sample, independent of the dataDn

■ F.3 Stochastic Equicontinuity without Sample Splitting. In the main text, we assume that the nuisance parameter estimator ˆgn is computed on a separate sample, independent of the dataDn. In this section, we give additional conditions under which the nuisance parameter estimator ˆgn can be computed on the same data used to construct the estimator ˆθn(x(d...

work page 2022

[15] [15]

double-centering trick,

First, we require that moment m(Di, g) satisfies a pair of smoothness conditions. Assumption F.1. Define the higher-order variogram V (q)(x; g, g′) = E h m(Di, g) − m(Di, g′) 2q | Xi = x i for each g and g′ in G. For each integer q > 0, the Lipschitz condition |V (q)(x; g, g′) − V (q)(x′; g, g′)| ≲ ∥x − x′∥q ∞ (F.9) holds for each x and x′ in X and the me...

work page 2019

[16] [16]

(F.38) For reasons of space, we give the details of the proof of the upper bound encoded in (4.8)

Define the normalized quantity ˆu(1)(x(d); Di) = Σ−1/2˜u(1)(x(d); Di) . (F.38) For reasons of space, we give the details of the proof of the upper bound encoded in (4.8). The lower bound will follow from an analogous argument, and we note the differences where they occur. Observe that the decomposition (F.35) implies the upper bound P r n b2 Σ−1/2U n,b(x(...

work page 2015

[17] [17]

G.1 Data

Appendix G.3 discusses our simulation calibration. G.1 Data. The data from Banerjee et al. (2015) were acquired from https://dataverse.harvard. edu/dataset.xhtml?persistentId=doi:10.7910/DVN/NHIXNT on September 10,

work page doi:10.7910/dvn/nhixnt 2015

[18] [18]

Here, we consider the data from the graduation program implemented in Ghana, as it has a larger sample size

The data from the graduation program implemented in Pakistan are considered in Chen and Ritzwoller (2023) and Ritzwoller and Romano (2023). Here, we consider the data from the graduation program implemented in Ghana, as it has a larger sample size. The data record measurements of many pre-treatment and post-treatment outcomes for 2,606 individuals in the ...

work page 2023

[19] [19]

In Figure 1 and Figure 2, we set the subsample proportion b/n equal to 0.05

G.2 Parameter Choices. In Figure 1 and Figure 2, we set the subsample proportion b/n equal to 0.05. In constructing the nuisance parameter estimate, we use set b/n = 0.025. We use r = 200 bootstrap replicates throughout. We use 20,000 trees to construct Figure 1 and 2,000 trees in each bootstrap replicate to construct Figure 2 and throughout the simulatio...

work page 2015

[20] [20]

As the data Dh are drawn independently and identically with distribution P in P and we have assumed that δn/2 ≲ δn and ρn/2 ≲ ρn, by setting t = Cδn, Assumption A.3 and the bound (H.23) imply that the event En(t) occurs with probability greater than 1 − Cρn. Thus, Lemma D.1 implies that P n√nR∗ n(x(d)) ∈ R | Dn o − P {Z ∈ R} (H.46) 37 ≤ P ( 2√n nX i∈h ˆu(...

work page 2023

[21] [21]

To handle the term (H.51), we apply the following quantitative central limit theorem, stated as Lemma 4.6 in Chernozhukov et al

Lemma D.1 implies that (H.52) is less than t p log(d). To handle the term (H.51), we apply the following quantitative central limit theorem, stated as Lemma 4.6 in Chernozhukov et al. (2022a). Lemma H.2 (Lemma 4.6, Chernozhukov et al., 2022a) . Consider the setting and assumptions of Lemma H.1. Let Xn = ( X1, . . . , Xn) collect the observed data and let ...

work page 2019

[22] [22]

See Chapter 1 of De la Pena and Gin´e (1999) for a textbook treatment

The second inequality is a L´evy type inequality for independent random vectors, due to Montgomery-Smith (1993). See Chapter 1 of De la Pena and Gin´e (1999) for a textbook treatment. Lemma H.4 (Theorem 1.1.5, De la Pena and Gin´e (1999)). Let Z1, . . . , Zn be independent random vectors in Rd. There exists a universal constants C1 and C2 such that P ( ma...

work page 1993

[23] [23]

(H.62) Now, the choice δ = C r log n n (H.63) gives P Qn ≥ δn 2 ≲ 1 n (H.64) by (H.58)

In particular, as ∥ 2√n ˆu(x(j); Di)∥ψ1 ≤ 2√n φ λ , and max j∈[d] δn/2X k=1 E 2√n ˆu(x(j); Di) = 2δ , (H.61) Lemma H.3 and Lemma H.4 imply that P ( max 1≤k≤δ n 2 2√n kX i=1 ˆu x(d); Di ∞ ≥ C δ log1/2(dn) + 2√n φ λ log2(dn) ) ≲ 1 n . (H.62) Now, the choice δ = C r log n n (H.63) gives P Qn ≥ δn 2 ≲ 1 n (H.64) by (H.58). Plugging this choice into (H.62) yie...

work page 2018

[24] [24]

Observe that 2 n X i∈s ˆu(x(d), Di) − 1 n nX i=1 ˆu(x(d), Di) = 1 n nX i=1 ˜Viˆu(x(d), Di)

Let ˜Vi be a random variable taking the value 1 with i is an element of the subset s and taking the value −1 otherwise. Observe that 2 n X i∈s ˆu(x(d), Di) − 1 n nX i=1 ˆu(x(d), Di) = 1 n nX i=1 ˜Viˆu(x(d), Di) . (J.7) and that the weights ˜Vi are independent and identically distributed Rademacher random variables. Thus, on the event E ′n(t, t0), we have ...

work page 2022

[25] [25]

It will suffice to show that 1 n! X π∈Pn n b −1 ⌊n/b⌋X l=1 F1(x; sπ,l) 2q ≲ 2qn−1bγ(2q)b1/2q. (J.15) 55 In particular, if (J.15) holds, then we have that P    1 n! X π∈Pn n b −1 ⌊n/b⌋X l=1 F1(x; sπ,l) > 2√e qn−1bγ(2q)b1/2q    ≤ 1 exp(q) E " | 1 n! P π∈Pn n b −1P⌊n/b⌋ l=1 F1(x; sπ,l)|2q # 2qn−1bγ(2q)b1/2q 2q ≲ exp(−q), (J.16) by Markov’s inequality. ...

work page 1996

[26] [26]

, Xb be independent centered random variables

Let X1, . . . , Xb be independent centered random variables. For any integer q ≥ 2, we have that ∥ 1 n X Xi∥1 ≲ r Var(Xi)q b + q b ∥ max i∈[b] Xi∥q. In particular, Lemma J.3 implies that ∥ b′ n bX i=1 u(1)(Di) − u(1)(D′ i) ∥q = b′b n ∥1 b bX i=1 u(1)(Di) − u(1)(D′ i) ∥q ≲ b′b n r σ2 b′q b + q ϕ b ! ≲ √ b′b n qϕ, (J.47) where the second inequality follows ...

work page 2015

[27] [27]

FIGURE J.2. CATE Estimates, 2-Fold Cross-Fitting 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption 0.1 0.2 0.3 CATE Notes: Figure 1 displays a heat map giving CATE estimates for the intervention studied in Banerjee et al. (2015) on post-treatment assets, using the the 2-fold cross-split estimator (F.24). Other features of the disp...

work page 2015

[28] [28]

65 FIGURE J.3. Performance, 2-Fold Cross-Fitting Mean Length Max Bias Mean Bias Coverage Lower Coverage Upper Coverage 1.0 2.5 5.0 7.5 1.0 2.5 5.0 7.5 1.0 2.5 5.0 7.5 0.88 0.90 0.92 0.94 0.0020 0.0025 0.0030 0.0035 0.0040 0.90 0.91 0.92 0.93 0.94 0.030 0.035 0.040 0.045 0.050 0.055 0.800 0.825 0.850 0.875 0.900 0.20 0.25 0.30 0.35 Sample Multiplier (h) b ...

work page 2015

[29] [29]

FIGURE J.4. Half-Sample Confidence Region Lower Bound, 2-Fold Cross-Fitting 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption −0.2 −0.1 0.0 0.1 CATE Notes: Figure 2 displays heat maps giving half-sample lower confidence bound for the CATE of the intervention studied in Banerjee et al. (2015) on post-treatment total assets. Other f...

work page 2015

[30] [30]

66 FIGURE J.5. Calibrated CATEs 3.0 3.5 4.0 4.5 5.0 −0.6 −0.3 0.0 0.3 Baseline Assets Baseline Consumption CATE 0.05 0.10 0.15 0.20 0.25 Notes: Figure J.5 displays the “true” value of CATE used in our calibrated simulation. 67 FIGURE J.6. Validation X Y Mean 1e−02 1e+00 1e+02 1e+04 1e−02 1e+00 1e+02 1e+04 1e−02 1e+00 1e+02 1e+04Generated Data X Y Variance...

work page 2015