arxiv: 2604.27813 · v1 · submitted 2026-04-30 · 🧮 math.ST · stat.TH

Recognition: unknown

A High Dimensional Wild Bootstrap Max-Test for Detecting the Presence of Significant Predictors

Jonathan B. Hill

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:52 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords high-dimensional regressionbootstrap testmax-statisticpredictor screeningtime series dependencewild bootstrapmarginal regressiondependent data

0 comments

The pith

A wild block bootstrap max-test detects significant predictors in high-dimensional regression even when the number of covariates grows exponentially with sample size under weak dependence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a max-test that takes the largest estimated coefficient across many possible predictors to decide whether any are meaningfully related to the outcome. This avoids estimating the full covariance matrix among all coefficients and instead approximates the test's distribution directly with a wild block bootstrap that respects time-series dependence and allows for non-stationary data. The approach works when the dimension p is much larger than n, provided the log of p grows slower than a power of n that depends on how fast dependence decays. It maintains correct size and shows power even against very weak or sparse signals without needing Bonferroni adjustments or selecting which predictor to test after looking at the data.

Core claim

The authors construct a block bootstrap max-test for the presence of significant predictors in high-dimensional marginal regression. By using a max-statistic over the computed parameters and approximating its non-standard limit distribution with a multiplier wild block bootstrap, the procedure controls size and has power against deviations from the null including weak or sparse signals, without covariance matrix estimation, endogenous selection, or post-estimation corrections, provided the data satisfy physical dependence and ln(p) = o(n^a) for a suitable a.

What carries the argument

The max-statistic over the many marginal regression coefficients, with its distribution approximated by a wild block bootstrap that resamples data blocks to capture dependence.

If this is right

The test can be used for predictor screening when p grows exponentially in n as long as the log-growth condition holds.
No covariance matrix estimation or Bonferroni correction is required for valid inference.
The procedure maintains power against slight deviations from the null, including sparse weak signals.
It applies directly to heterogeneous and possibly non-stationary time series under the physical dependence assumption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The computational simplicity from avoiding covariance estimation could make the test attractive for screening in very large financial or economic datasets.
The VIX example indicates the method can be applied to volatility modeling where predictors may be weakly dependent.
Similar wild bootstrap approximations for max-statistics might simplify inference in other high-dimensional dependent settings such as macroeconomic forecasting.

Load-bearing premise

The data must obey a physical dependence condition that controls how dependence decays, and the number of predictors p must satisfy a growth restriction relative to sample size n that depends on that decay rate and moment conditions.

What would settle it

A simulation study in which the test rejects the null of no predictors at a rate far above the nominal level, when the data satisfy physical dependence and ln(p) lies inside the allowed growth bound, would show the size control fails.

read the original abstract

We construct a block bootstrap max-test for detecting the presence of significant predictors in a high dimensional setting, allowing for weakly dependent and heterogeneous (possibly non-stationary) data. The number of covariates to be screened may be large $p$ $>>$ $n$, and growing at an exponential rate, provided $\ln (p)$ $=$ $o(n^{a})$ for some $a$ $>$ $0$ that depends on memory decay and the growth of higher moments. We study the problem of correlation screening in a high dimensional marginal regression setting, assuming so-called \textit{physical dependence} in a time series setting. We entirely sidestep covariance matrix estimation and adaptive re-sampling by working with a max-statistic over the many computed parameters. Thus we do not need endogenous selection of the most relevant predictor index yielding non-uniform asymptotics, nor do we need a post-estimation Bonferroni correction. The non-standard limit distribution arising from the maximum of an increasing number of estimators is easily approximated by a multiplier (wild) block bootstrap. The max-test controls for size well, performs well against various deviations from the null, including very slight deviations with a weak or sparse signal. A numerical experiment is performed and an empirical example with the VIX volatility index is provided.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hill's wild block bootstrap max-test for high-dimensional marginal screening under physical dependence is a clean way to avoid covariance estimation and selection bias, but the size control rests on dependence decay rates that may be too restrictive for many time series applications.

read the letter

The paper's main contribution is a multiplier wild block bootstrap for the maximum of marginal regression t-statistics, designed for p much larger than n under physical dependence. It lets the data be weakly dependent or non-stationary and claims asymptotic size control when ln(p) = o(n^a) for an a that depends on how fast the dependence coefficients decay and on moment bounds. By working directly with the max statistic it skips covariance matrix estimation and any post-selection Bonferroni adjustment, which removes two common sources of trouble in high-dimensional screening. The numerical study reports reasonable size and power even against weak or sparse alternatives, and the VIX volatility example shows the procedure can be run on real data. Those are the practical upsides. The soft spot is exactly the rate condition. If the physical dependence decays only polynomially with a modest exponent, a becomes small and the permitted growth in p shrinks sharply; the bootstrap then has to approximate the joint tail of many dependent statistics under conditions that may not hold in persistent series. The abstract states the result but does not spell out how sensitive the constants are or how the block length is chosen in practice, so the size guarantee could be narrower than it first appears. The moment requirements also look standard but would benefit from explicit discussion of what happens when they are only barely satisfied. This work is aimed at researchers in high-dimensional time series inference who want a screening tool that does not rely on full covariance or adaptive thresholding. A reader already comfortable with Wu's physical dependence framework and with bootstrap methods for maxima will get the most out of the construction and the simulations. The paper is coherent on its own terms and engages the relevant literature without circularity, so it deserves a serious referee. I would send it to review with the expectation that the dependence-rate calculations and block-length robustness get the closest look.

Referee Report

2 major / 3 minor

Summary. The paper constructs a multiplier block bootstrap max-test for detecting significant predictors in high-dimensional marginal regression with weakly dependent or non-stationary time series data under physical dependence. It permits p to grow exponentially with n as long as ln(p) = o(n^a) for a > 0 determined by dependence decay and moments. The method uses the max of marginal statistics and the wild block bootstrap to approximate its limiting distribution, avoiding covariance estimation and Bonferroni corrections. Simulations demonstrate good size control and power against weak and sparse alternatives, with an application to VIX data.

Significance. Should the bootstrap consistency hold, the contribution is significant for high-dimensional inference in time series, providing a practical tool that handles dependence without estimating large covariance matrices. The simulation results for size and power, including slight deviations, and the empirical example with VIX support its utility in applied settings. This could influence methods for variable screening in dependent high-dimensional data.

major comments (2)

[Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.
[Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.

minor comments (3)

[Abstract] The phrasing 'ln (p) = o(n^a)' contains unnecessary spaces; consider standard mathematical notation ln(p) = o(n^a).
[Introduction] A brief comparison with existing max-tests or bootstrap methods for high-dimensional data (e.g., references to works on Gaussian approximations or other resampling schemes) would help situate the contribution.
[Empirical Example] More details on the data preprocessing and choice of block length in the VIX application would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions we plan to incorporate.

read point-by-point responses

Referee: [Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.

Authors: We agree that the dependence of the exponent a on the physical dependence coefficients and moment conditions should be stated more explicitly. Theorem 3.1 currently expresses the rate condition in terms of an a > 0 that depends on memory decay and higher-moment growth, but we will add a dedicated remark immediately after the theorem. This remark will derive the admissible range for a from the summability rate of the physical dependence measures (e.g., when the coefficients satisfy a polynomial decay of order r, a is bounded above by a function of r and the moment index). The remark will also note that for very slow polynomial decay the resulting a may be small, thereby restricting the allowable growth of p; this is the expected behavior under the stated assumptions and does not contradict the paper’s claim, which is conditional on the existence of some a > 0. We will revise the manuscript accordingly. revision: yes
Referee: [Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.

Authors: We acknowledge that the existing simulation designs employ dependence structures with relatively rapid decay. To address the referee’s concern, we will expand Section 5 with additional Monte Carlo experiments that vary the decay rate of the physical dependence coefficients, including polynomial decay rates that place the design near the boundary of the admissible a in the condition ln(p) = o(n^a). These new scenarios will report empirical size and power, thereby providing direct numerical support for the theoretical rate restrictions. The revised manuscript will include these results together with a brief discussion of how the chosen decay parameters relate to the conditions of Theorem 3.1. revision: yes

Circularity Check

0 steps flagged

No circularity: bootstrap consistency follows from standard theory under external physical dependence assumptions

full rationale

The derivation applies the multiplier block bootstrap to the max-statistic of marginal regression coefficients under the null, with the limiting distribution approximated via resampling of the physical dependence process. The rate condition ln(p)=o(n^a) is obtained directly from summability of the Wu physical dependence coefficients together with moment bounds; these are stated as primitive assumptions imported from the existing literature on dependent processes rather than fitted or defined in terms of the target result. The paper explicitly sidesteps covariance estimation and selection by using the max-statistic, whose extreme-value limit is handled by the bootstrap without any self-referential equation or reduction of a prediction to a fitted input. No load-bearing step reduces by construction to a self-citation, ansatz smuggled via prior work, or renaming of a known pattern; the central size-control claim therefore rests on independent bootstrap theory for dependent data and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about dependence and moment conditions standard in time series bootstrap literature, with no free parameters or invented entities introduced in the abstract.

axioms (2)

domain assumption physical dependence in a time series setting
Invoked to justify the block bootstrap for weakly dependent and possibly non-stationary data.
domain assumption ln(p) = o(n^a) for some a > 0 depending on memory decay and higher moments
Required for the asymptotic validity of the max-test when p grows exponentially with n.

pith-pipeline@v0.9.0 · 5518 in / 1337 out tokens · 34807 ms · 2026-05-07T05:52:15.915076+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references

[1]

Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables. Economet. Theory , 4:458--467

1988
[2]

Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary. Econometrica , 67:1341--1383

1999
[3]

Andrews, D. W. K. and Cheng, X. (2012). Estimation and inference with weak, semi-strong and strong identification. Econometrica , 80:2153–2211

2012
[4]

Andrews, D. W. K. and Cheng, X. (2013). Maximum likelihood estimation and uniform inference with sporadic identification failure. J. Econometrics , 173:36--56

2013
[5]

Andrews, D. W. K. and Cheng, X. (2014). Gmm estimation and uniform subvector inference with possible identification failure. Economet. Theory , 30:287--333

2014
[6]

Andrews, D. W. K. and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present only under the alternative. Econometrica , 62:1383--1414

1994
[7]

and Hahn, J

Angrist, J. and Hahn, J. (2004). When to control for covariates? panel asymptotics for estimates of treatment effects. Review Econom. Statist. , 86:58--72

2004
[8]

A., and Mikosch, T

Basrak, B., Davis, R. A., and Mikosch, T. (2002). Regular variation of garch processes. Stoch. Process. Appl. , 99:95--115

2002
[9]

Belloni, A., Chernozhukov, V., and Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. J. Econom. Perspect. , 28:29--50

2014
[10]

and Hochberg, Y

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B , 57:289–300

1995
[11]

D., Buja, A., Zhang, K., and Zhao, L

Berk, R., Brown, L. D., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. , 41:802--837

2013
[12]

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31:307--327

1986
[13]

Bose, A. (1988). Edgewordth correction by bootstrap in autoregressions. Ann. Statist. , 16:1709--1722

1988
[14]

and Picard, N

Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. Annals of Probability , 20:1714--1730

1992
[15]

and van de Geer, S

Buhlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data . Springer, Berlin

2011
[16]

Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. , 39:1496--1525

2011
[17]

Cai, T. T. and Jiang, T. (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. J. Multivariate Anal. , 107:24–39

2012
[18]

Chang, J., Chen, X., and Wu, M. (2024). Central limit theorems for high dimensional dependent data. Bernoulli , 30:712--742

2024
[19]

Chang, J., Jiang, Q., and Shao, X. (2023). Testing the martingale difference hypothesis in high dimension. J. Econometrics , 235:972--1000

2023
[20]

Chang, J., Tang, C., and Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. Ann. Statist. , 41:2123--2148

2013
[21]

and Kato, K

Chen, X. and Kato, K. (2019). Randomized incomplete u-statistics in high dimensions. Ann. Statist. , 47:3127--3156

2019
[22]

Cheng, X. (2015). Robust inference in nonlinear models with mixed indentification strength. J. Econometrics , 189:207--228

2015
[23]

Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. , 41:2786--2819

2013
[24]

Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of G aussian random vectors. Probab. Theory Related Fields , 162:47--70

2015
[25]

Correia, S. (2016). A feasible estimator for linear models with multi-way fixed effects. Unpublished manuscipt (scorreia.com/research/hdfe.pdf)

2016
[26]

P., and Boldrick, J

Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. , 18:71--103

2003
[27]

Efron, B. (2006). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. , 99:96–104

2006
[28]

and Li, R

Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Sanz-Sole, M., Soria, J., Varona, J. L., and Verdera, J., editors, Proc. Internat. Congress of Mathematicians , volume III, pages 595--622, Zurich. European Mathematical Society

2006
[29]

and Lv, J

Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B , 70:849--911

2008
[30]

Fan, J., Lv, J., and Qi, . L. (2011). Sparse high-dimensional models in economics. Annual Econ. Rev. , 3:291--317

2011
[31]

Gallant, A. R. and White, H. (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models . Basil Blackwell, New York

1988
[32]

R., Jin, J., Wasserman, L., and Yao, Z

Genovese, C. R., Jin, J., Wasserman, L., and Yao, Z. (3023). A comparison of the lasso and marginal regression. J. Mach. Learning Research , 13:2107--2143
[33]

Giannone, D., Lenza, M., and Primiceri, G. E. (2021). Economic predictions with big data: The illusion of sparsity. Technical Report 2542, European Central Bank

2021
[34]

and Zinn, J

Gine, E. and Zinn, J. (1990). Bootstrapping general empirical measures. Ann. Probab. , 18:851--869

1990
[35]

Hansen, B. E. (1996). Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica , 64:413--430

1996
[36]

Hill, J. B. (2021). Weak identification robust wild bootstrap applied to a consistent model specification test. Economet. Theory , 37:409--463

2021
[37]

Hill, J. B. (2025a). Mixingale and physical dependence equality with applications. Statist. Probab. Letters , 221:110380
[38]

Hill, J. B. (2025b). Testing many zero restrictions in a high dimensional linear regression setting. J. Bus. Econom. Statist. , 43:55--67
[39]

Hill, J. B. (2026a). Supplemental material for ``a high dimensional wild bootstrap max-test for detecting the presence of significant predictors". Dept. of Economics, University of North Carolina - Chapel Hill
[40]

Hill, J. B. (2026b). Supplemental material for ``max-laws of large numbers for high dimensional arrays with applications''. Dept. of Economics, University of North Carolina - Chapel Hill
[41]

Hill, J. B. and Li, T. (2025). A bootstrapped test of covariance stationarity based on orthonormal transformations. Bernoulli , 31:1527--1551

2025
[42]

Hill, J. B. and Motegi, K. (2020). A max-correlation white noise for weakly dependent time series. Economet. Theory , 36:907--960

2020
[43]

W., and Qian, M

Huang, T.-J., McKeague, I. W., and Qian, M. (2019). Marginal screening for high-dimensional predictors of survival outcomes. Stat. Sin. , 29:2105--2139

2019
[44]

Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. , 14:865–880

2004
[45]

Kesten, H. (1973). Random difference equations and renewal theory for products of random matrices. Acta Mathematica , 131:207–248

1973
[46]

K., and Roeslgaard, S

Koles \'a r, M., M \"u ller, U. K., and Roeslgaard, S. T. (2024). The fragility of sparsity. Dept. of Economics, Princeton University

2024
[47]

and Murphy, S

Laber, E. and Murphy, S. A. (2011). Adaptive confidence intervals for the test error in classification. J. Amer. Statist. Assoc. , 106:904--913

2011
[48]

and P\" o tscher, B

Leeb, H. and P\" o tscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators. Ann. Statist. , 34:2554--2591

2006
[49]

Li, D., Liu, W., and Rosalsky, A. (2009). Necessary and sufficient conditions for the asymptotic distribution of the largest entry of a sample correlation matrix. Probab. Theory Related Fields , 148:5--35

2009
[50]

Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Statist. , 16:1696--1708

1988
[51]

Liu, W., Lin, Z., and Shao, Q. (2008). The asymptotic distribution and berry–esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab. , 18:2337--2366

2008
[52]

Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika , 65:297--303

1978
[53]

McCloskey, A. (2017). Bonferroni-based size-correction for nonstandard testing problems. J. Econometrics , 200:17--35

2017
[54]

McCloskey, A. (2020). Asymptotically uniform tests after consistent model selection in the linear regression model. J. Bus. Econom. Statist. , 38:810--825

2020
[55]

McCloskey, A. (2024). Hybrid confidence intervals for informative uniform asymptotic inference after model selection. Biometrika , 111:109--127

2024
[56]

and Qian, M

McKeague, I. and Qian, M. (2015). An adaptive resampling test for detecting the presence of significant predictors. J. Amer. Statist. Assoc. , 110:1422--1433

2015
[57]

and Zhang, I

McKeague, I. and Zhang, I. (2022). Significance testing for canonical correlation analysis in high dimensions. Biometrika , 109:1076--1083

2022
[58]

McLeish, D. L. (1975). A maximal inequality and dependent strong laws. Ann. Probab. , 3:829--839

1975
[59]

Nemirovski, A. S. (2000). Topics in nonparametric statistics. In Letures on Probability Theory and Statistics . Springer, Berlin. Lectures Notes on Mathematics, vol. 1738

2000
[60]

Rio, E. (2017). Asymptotic Theory of Weakly Dependent Random Processes . Springer

2017
[61]

Sawa, T. (1978). Iinformation criteria for discriminating among alternative regression models. Econometrica , 46:1273--1291

1978
[62]

and Zhou, W.-X

Shao, Q.-M. and Zhou, W.-X. (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. , 42:623--648

2014
[63]

Shao, X. (2011). A bootstrap-assisted spectral test of white noise under unknown dependence. J. Econometrics , 162:213--224

2011
[64]

J., , and Barut, E

Tang, Y., Wang, H. J., , and Barut, E. (2018). Testing for the presence of significant covariates through conditional marginal regression. Biometrika , 105:57--71

2018
[65]

Vershynin, R. (2018). High-Dimensional Probability . Cambridge University Press, Cambridge, UK

2018
[66]

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica , 50:1--25

1982
[67]

Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. , 102:14150--14154

2005
[68]

Wu, W. B. and Min, M. (2005). On linear processes with dependent innovations. Stochastic Process. Appl. , 115:939--958

2005
[69]

Wu, W. B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Statist. , 10:352--379

2016
[70]

and Laber, E

Zhang, Y. and Laber, E. B. (2015). Comment: An adaptive resampling test fordetecting the presence of signicant predictors. J. Amer. Statist. Assoc. , 110:1451--1454

2015