Recognition: unknown
A High Dimensional Wild Bootstrap Max-Test for Detecting the Presence of Significant Predictors
Pith reviewed 2026-05-07 05:52 UTC · model grok-4.3
The pith
A wild block bootstrap max-test detects significant predictors in high-dimensional regression even when the number of covariates grows exponentially with sample size under weak dependence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors construct a block bootstrap max-test for the presence of significant predictors in high-dimensional marginal regression. By using a max-statistic over the computed parameters and approximating its non-standard limit distribution with a multiplier wild block bootstrap, the procedure controls size and has power against deviations from the null including weak or sparse signals, without covariance matrix estimation, endogenous selection, or post-estimation corrections, provided the data satisfy physical dependence and ln(p) = o(n^a) for a suitable a.
What carries the argument
The max-statistic over the many marginal regression coefficients, with its distribution approximated by a wild block bootstrap that resamples data blocks to capture dependence.
If this is right
- The test can be used for predictor screening when p grows exponentially in n as long as the log-growth condition holds.
- No covariance matrix estimation or Bonferroni correction is required for valid inference.
- The procedure maintains power against slight deviations from the null, including sparse weak signals.
- It applies directly to heterogeneous and possibly non-stationary time series under the physical dependence assumption.
Where Pith is reading between the lines
- The computational simplicity from avoiding covariance estimation could make the test attractive for screening in very large financial or economic datasets.
- The VIX example indicates the method can be applied to volatility modeling where predictors may be weakly dependent.
- Similar wild bootstrap approximations for max-statistics might simplify inference in other high-dimensional dependent settings such as macroeconomic forecasting.
Load-bearing premise
The data must obey a physical dependence condition that controls how dependence decays, and the number of predictors p must satisfy a growth restriction relative to sample size n that depends on that decay rate and moment conditions.
What would settle it
A simulation study in which the test rejects the null of no predictors at a rate far above the nominal level, when the data satisfy physical dependence and ln(p) lies inside the allowed growth bound, would show the size control fails.
read the original abstract
We construct a block bootstrap max-test for detecting the presence of significant predictors in a high dimensional setting, allowing for weakly dependent and heterogeneous (possibly non-stationary) data. The number of covariates to be screened may be large $p$ $>>$ $n$, and growing at an exponential rate, provided $\ln (p)$ $=$ $o(n^{a})$ for some $a$ $>$ $0$ that depends on memory decay and the growth of higher moments. We study the problem of correlation screening in a high dimensional marginal regression setting, assuming so-called \textit{physical dependence} in a time series setting. We entirely sidestep covariance matrix estimation and adaptive re-sampling by working with a max-statistic over the many computed parameters. Thus we do not need endogenous selection of the most relevant predictor index yielding non-uniform asymptotics, nor do we need a post-estimation Bonferroni correction. The non-standard limit distribution arising from the maximum of an increasing number of estimators is easily approximated by a multiplier (wild) block bootstrap. The max-test controls for size well, performs well against various deviations from the null, including very slight deviations with a weak or sparse signal. A numerical experiment is performed and an empirical example with the VIX volatility index is provided.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs a multiplier block bootstrap max-test for detecting significant predictors in high-dimensional marginal regression with weakly dependent or non-stationary time series data under physical dependence. It permits p to grow exponentially with n as long as ln(p) = o(n^a) for a > 0 determined by dependence decay and moments. The method uses the max of marginal statistics and the wild block bootstrap to approximate its limiting distribution, avoiding covariance estimation and Bonferroni corrections. Simulations demonstrate good size control and power against weak and sparse alternatives, with an application to VIX data.
Significance. Should the bootstrap consistency hold, the contribution is significant for high-dimensional inference in time series, providing a practical tool that handles dependence without estimating large covariance matrices. The simulation results for size and power, including slight deviations, and the empirical example with VIX support its utility in applied settings. This could influence methods for variable screening in dependent high-dimensional data.
major comments (2)
- [Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.
- [Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.
minor comments (3)
- [Abstract] The phrasing 'ln (p) = o(n^a)' contains unnecessary spaces; consider standard mathematical notation ln(p) = o(n^a).
- [Introduction] A brief comparison with existing max-tests or bootstrap methods for high-dimensional data (e.g., references to works on Gaussian approximations or other resampling schemes) would help situate the contribution.
- [Empirical Example] More details on the data preprocessing and choice of block length in the VIX application would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions we plan to incorporate.
read point-by-point responses
-
Referee: [Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.
Authors: We agree that the dependence of the exponent a on the physical dependence coefficients and moment conditions should be stated more explicitly. Theorem 3.1 currently expresses the rate condition in terms of an a > 0 that depends on memory decay and higher-moment growth, but we will add a dedicated remark immediately after the theorem. This remark will derive the admissible range for a from the summability rate of the physical dependence measures (e.g., when the coefficients satisfy a polynomial decay of order r, a is bounded above by a function of r and the moment index). The remark will also note that for very slow polynomial decay the resulting a may be small, thereby restricting the allowable growth of p; this is the expected behavior under the stated assumptions and does not contradict the paper’s claim, which is conditional on the existence of some a > 0. We will revise the manuscript accordingly. revision: yes
-
Referee: [Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.
Authors: We acknowledge that the existing simulation designs employ dependence structures with relatively rapid decay. To address the referee’s concern, we will expand Section 5 with additional Monte Carlo experiments that vary the decay rate of the physical dependence coefficients, including polynomial decay rates that place the design near the boundary of the admissible a in the condition ln(p) = o(n^a). These new scenarios will report empirical size and power, thereby providing direct numerical support for the theoretical rate restrictions. The revised manuscript will include these results together with a brief discussion of how the chosen decay parameters relate to the conditions of Theorem 3.1. revision: yes
Circularity Check
No circularity: bootstrap consistency follows from standard theory under external physical dependence assumptions
full rationale
The derivation applies the multiplier block bootstrap to the max-statistic of marginal regression coefficients under the null, with the limiting distribution approximated via resampling of the physical dependence process. The rate condition ln(p)=o(n^a) is obtained directly from summability of the Wu physical dependence coefficients together with moment bounds; these are stated as primitive assumptions imported from the existing literature on dependent processes rather than fitted or defined in terms of the target result. The paper explicitly sidesteps covariance estimation and selection by using the max-statistic, whose extreme-value limit is handled by the bootstrap without any self-referential equation or reduction of a prediction to a fitted input. No load-bearing step reduces by construction to a self-citation, ansatz smuggled via prior work, or renaming of a known pattern; the central size-control claim therefore rests on independent bootstrap theory for dependent data and is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption physical dependence in a time series setting
- domain assumption ln(p) = o(n^a) for some a > 0 depending on memory decay and higher moments
Reference graph
Works this paper leans on
-
[1]
Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables. Economet. Theory , 4:458--467
1988
-
[2]
Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary. Econometrica , 67:1341--1383
1999
-
[3]
Andrews, D. W. K. and Cheng, X. (2012). Estimation and inference with weak, semi-strong and strong identification. Econometrica , 80:2153–2211
2012
-
[4]
Andrews, D. W. K. and Cheng, X. (2013). Maximum likelihood estimation and uniform inference with sporadic identification failure. J. Econometrics , 173:36--56
2013
-
[5]
Andrews, D. W. K. and Cheng, X. (2014). Gmm estimation and uniform subvector inference with possible identification failure. Economet. Theory , 30:287--333
2014
-
[6]
Andrews, D. W. K. and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present only under the alternative. Econometrica , 62:1383--1414
1994
-
[7]
and Hahn, J
Angrist, J. and Hahn, J. (2004). When to control for covariates? panel asymptotics for estimates of treatment effects. Review Econom. Statist. , 86:58--72
2004
-
[8]
A., and Mikosch, T
Basrak, B., Davis, R. A., and Mikosch, T. (2002). Regular variation of garch processes. Stoch. Process. Appl. , 99:95--115
2002
-
[9]
Belloni, A., Chernozhukov, V., and Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. J. Econom. Perspect. , 28:29--50
2014
-
[10]
and Hochberg, Y
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B , 57:289–300
1995
-
[11]
D., Buja, A., Zhang, K., and Zhao, L
Berk, R., Brown, L. D., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. , 41:802--837
2013
-
[12]
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31:307--327
1986
-
[13]
Bose, A. (1988). Edgewordth correction by bootstrap in autoregressions. Ann. Statist. , 16:1709--1722
1988
-
[14]
and Picard, N
Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. Annals of Probability , 20:1714--1730
1992
-
[15]
and van de Geer, S
Buhlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data . Springer, Berlin
2011
-
[16]
Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. , 39:1496--1525
2011
-
[17]
Cai, T. T. and Jiang, T. (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. J. Multivariate Anal. , 107:24–39
2012
-
[18]
Chang, J., Chen, X., and Wu, M. (2024). Central limit theorems for high dimensional dependent data. Bernoulli , 30:712--742
2024
-
[19]
Chang, J., Jiang, Q., and Shao, X. (2023). Testing the martingale difference hypothesis in high dimension. J. Econometrics , 235:972--1000
2023
-
[20]
Chang, J., Tang, C., and Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. Ann. Statist. , 41:2123--2148
2013
-
[21]
and Kato, K
Chen, X. and Kato, K. (2019). Randomized incomplete u-statistics in high dimensions. Ann. Statist. , 47:3127--3156
2019
-
[22]
Cheng, X. (2015). Robust inference in nonlinear models with mixed indentification strength. J. Econometrics , 189:207--228
2015
-
[23]
Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. , 41:2786--2819
2013
-
[24]
Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of G aussian random vectors. Probab. Theory Related Fields , 162:47--70
2015
-
[25]
Correia, S. (2016). A feasible estimator for linear models with multi-way fixed effects. Unpublished manuscipt (scorreia.com/research/hdfe.pdf)
2016
-
[26]
P., and Boldrick, J
Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. , 18:71--103
2003
-
[27]
Efron, B. (2006). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. , 99:96–104
2006
-
[28]
and Li, R
Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Sanz-Sole, M., Soria, J., Varona, J. L., and Verdera, J., editors, Proc. Internat. Congress of Mathematicians , volume III, pages 595--622, Zurich. European Mathematical Society
2006
-
[29]
and Lv, J
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B , 70:849--911
2008
-
[30]
Fan, J., Lv, J., and Qi, . L. (2011). Sparse high-dimensional models in economics. Annual Econ. Rev. , 3:291--317
2011
-
[31]
Gallant, A. R. and White, H. (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models . Basil Blackwell, New York
1988
-
[32]
R., Jin, J., Wasserman, L., and Yao, Z
Genovese, C. R., Jin, J., Wasserman, L., and Yao, Z. (3023). A comparison of the lasso and marginal regression. J. Mach. Learning Research , 13:2107--2143
-
[33]
Giannone, D., Lenza, M., and Primiceri, G. E. (2021). Economic predictions with big data: The illusion of sparsity. Technical Report 2542, European Central Bank
2021
-
[34]
and Zinn, J
Gine, E. and Zinn, J. (1990). Bootstrapping general empirical measures. Ann. Probab. , 18:851--869
1990
-
[35]
Hansen, B. E. (1996). Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica , 64:413--430
1996
-
[36]
Hill, J. B. (2021). Weak identification robust wild bootstrap applied to a consistent model specification test. Economet. Theory , 37:409--463
2021
-
[37]
Hill, J. B. (2025a). Mixingale and physical dependence equality with applications. Statist. Probab. Letters , 221:110380
-
[38]
Hill, J. B. (2025b). Testing many zero restrictions in a high dimensional linear regression setting. J. Bus. Econom. Statist. , 43:55--67
-
[39]
Hill, J. B. (2026a). Supplemental material for ``a high dimensional wild bootstrap max-test for detecting the presence of significant predictors". Dept. of Economics, University of North Carolina - Chapel Hill
-
[40]
Hill, J. B. (2026b). Supplemental material for ``max-laws of large numbers for high dimensional arrays with applications''. Dept. of Economics, University of North Carolina - Chapel Hill
-
[41]
Hill, J. B. and Li, T. (2025). A bootstrapped test of covariance stationarity based on orthonormal transformations. Bernoulli , 31:1527--1551
2025
-
[42]
Hill, J. B. and Motegi, K. (2020). A max-correlation white noise for weakly dependent time series. Economet. Theory , 36:907--960
2020
-
[43]
W., and Qian, M
Huang, T.-J., McKeague, I. W., and Qian, M. (2019). Marginal screening for high-dimensional predictors of survival outcomes. Stat. Sin. , 29:2105--2139
2019
-
[44]
Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. , 14:865–880
2004
-
[45]
Kesten, H. (1973). Random difference equations and renewal theory for products of random matrices. Acta Mathematica , 131:207–248
1973
-
[46]
K., and Roeslgaard, S
Koles \'a r, M., M \"u ller, U. K., and Roeslgaard, S. T. (2024). The fragility of sparsity. Dept. of Economics, Princeton University
2024
-
[47]
and Murphy, S
Laber, E. and Murphy, S. A. (2011). Adaptive confidence intervals for the test error in classification. J. Amer. Statist. Assoc. , 106:904--913
2011
-
[48]
and P\" o tscher, B
Leeb, H. and P\" o tscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators. Ann. Statist. , 34:2554--2591
2006
-
[49]
Li, D., Liu, W., and Rosalsky, A. (2009). Necessary and sufficient conditions for the asymptotic distribution of the largest entry of a sample correlation matrix. Probab. Theory Related Fields , 148:5--35
2009
-
[50]
Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Statist. , 16:1696--1708
1988
-
[51]
Liu, W., Lin, Z., and Shao, Q. (2008). The asymptotic distribution and berry–esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab. , 18:2337--2366
2008
-
[52]
Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika , 65:297--303
1978
-
[53]
McCloskey, A. (2017). Bonferroni-based size-correction for nonstandard testing problems. J. Econometrics , 200:17--35
2017
-
[54]
McCloskey, A. (2020). Asymptotically uniform tests after consistent model selection in the linear regression model. J. Bus. Econom. Statist. , 38:810--825
2020
-
[55]
McCloskey, A. (2024). Hybrid confidence intervals for informative uniform asymptotic inference after model selection. Biometrika , 111:109--127
2024
-
[56]
and Qian, M
McKeague, I. and Qian, M. (2015). An adaptive resampling test for detecting the presence of significant predictors. J. Amer. Statist. Assoc. , 110:1422--1433
2015
-
[57]
and Zhang, I
McKeague, I. and Zhang, I. (2022). Significance testing for canonical correlation analysis in high dimensions. Biometrika , 109:1076--1083
2022
-
[58]
McLeish, D. L. (1975). A maximal inequality and dependent strong laws. Ann. Probab. , 3:829--839
1975
-
[59]
Nemirovski, A. S. (2000). Topics in nonparametric statistics. In Letures on Probability Theory and Statistics . Springer, Berlin. Lectures Notes on Mathematics, vol. 1738
2000
-
[60]
Rio, E. (2017). Asymptotic Theory of Weakly Dependent Random Processes . Springer
2017
-
[61]
Sawa, T. (1978). Iinformation criteria for discriminating among alternative regression models. Econometrica , 46:1273--1291
1978
-
[62]
and Zhou, W.-X
Shao, Q.-M. and Zhou, W.-X. (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. , 42:623--648
2014
-
[63]
Shao, X. (2011). A bootstrap-assisted spectral test of white noise under unknown dependence. J. Econometrics , 162:213--224
2011
-
[64]
J., , and Barut, E
Tang, Y., Wang, H. J., , and Barut, E. (2018). Testing for the presence of significant covariates through conditional marginal regression. Biometrika , 105:57--71
2018
-
[65]
Vershynin, R. (2018). High-Dimensional Probability . Cambridge University Press, Cambridge, UK
2018
-
[66]
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica , 50:1--25
1982
-
[67]
Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. , 102:14150--14154
2005
-
[68]
Wu, W. B. and Min, M. (2005). On linear processes with dependent innovations. Stochastic Process. Appl. , 115:939--958
2005
-
[69]
Wu, W. B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Statist. , 10:352--379
2016
-
[70]
and Laber, E
Zhang, Y. and Laber, E. B. (2015). Comment: An adaptive resampling test fordetecting the presence of signicant predictors. J. Amer. Statist. Assoc. , 110:1451--1454
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.