pith. machine review for the scientific record. sign in

arxiv: 2604.27813 · v1 · submitted 2026-04-30 · 🧮 math.ST · stat.TH

Recognition: unknown

A High Dimensional Wild Bootstrap Max-Test for Detecting the Presence of Significant Predictors

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:52 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords high-dimensional regressionbootstrap testmax-statisticpredictor screeningtime series dependencewild bootstrapmarginal regressiondependent data
0
0 comments X

The pith

A wild block bootstrap max-test detects significant predictors in high-dimensional regression even when the number of covariates grows exponentially with sample size under weak dependence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a max-test that takes the largest estimated coefficient across many possible predictors to decide whether any are meaningfully related to the outcome. This avoids estimating the full covariance matrix among all coefficients and instead approximates the test's distribution directly with a wild block bootstrap that respects time-series dependence and allows for non-stationary data. The approach works when the dimension p is much larger than n, provided the log of p grows slower than a power of n that depends on how fast dependence decays. It maintains correct size and shows power even against very weak or sparse signals without needing Bonferroni adjustments or selecting which predictor to test after looking at the data.

Core claim

The authors construct a block bootstrap max-test for the presence of significant predictors in high-dimensional marginal regression. By using a max-statistic over the computed parameters and approximating its non-standard limit distribution with a multiplier wild block bootstrap, the procedure controls size and has power against deviations from the null including weak or sparse signals, without covariance matrix estimation, endogenous selection, or post-estimation corrections, provided the data satisfy physical dependence and ln(p) = o(n^a) for a suitable a.

What carries the argument

The max-statistic over the many marginal regression coefficients, with its distribution approximated by a wild block bootstrap that resamples data blocks to capture dependence.

If this is right

  • The test can be used for predictor screening when p grows exponentially in n as long as the log-growth condition holds.
  • No covariance matrix estimation or Bonferroni correction is required for valid inference.
  • The procedure maintains power against slight deviations from the null, including sparse weak signals.
  • It applies directly to heterogeneous and possibly non-stationary time series under the physical dependence assumption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The computational simplicity from avoiding covariance estimation could make the test attractive for screening in very large financial or economic datasets.
  • The VIX example indicates the method can be applied to volatility modeling where predictors may be weakly dependent.
  • Similar wild bootstrap approximations for max-statistics might simplify inference in other high-dimensional dependent settings such as macroeconomic forecasting.

Load-bearing premise

The data must obey a physical dependence condition that controls how dependence decays, and the number of predictors p must satisfy a growth restriction relative to sample size n that depends on that decay rate and moment conditions.

What would settle it

A simulation study in which the test rejects the null of no predictors at a rate far above the nominal level, when the data satisfy physical dependence and ln(p) lies inside the allowed growth bound, would show the size control fails.

read the original abstract

We construct a block bootstrap max-test for detecting the presence of significant predictors in a high dimensional setting, allowing for weakly dependent and heterogeneous (possibly non-stationary) data. The number of covariates to be screened may be large $p$ $>>$ $n$, and growing at an exponential rate, provided $\ln (p)$ $=$ $o(n^{a})$ for some $a$ $>$ $0$ that depends on memory decay and the growth of higher moments. We study the problem of correlation screening in a high dimensional marginal regression setting, assuming so-called \textit{physical dependence} in a time series setting. We entirely sidestep covariance matrix estimation and adaptive re-sampling by working with a max-statistic over the many computed parameters. Thus we do not need endogenous selection of the most relevant predictor index yielding non-uniform asymptotics, nor do we need a post-estimation Bonferroni correction. The non-standard limit distribution arising from the maximum of an increasing number of estimators is easily approximated by a multiplier (wild) block bootstrap. The max-test controls for size well, performs well against various deviations from the null, including very slight deviations with a weak or sparse signal. A numerical experiment is performed and an empirical example with the VIX volatility index is provided.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper constructs a multiplier block bootstrap max-test for detecting significant predictors in high-dimensional marginal regression with weakly dependent or non-stationary time series data under physical dependence. It permits p to grow exponentially with n as long as ln(p) = o(n^a) for a > 0 determined by dependence decay and moments. The method uses the max of marginal statistics and the wild block bootstrap to approximate its limiting distribution, avoiding covariance estimation and Bonferroni corrections. Simulations demonstrate good size control and power against weak and sparse alternatives, with an application to VIX data.

Significance. Should the bootstrap consistency hold, the contribution is significant for high-dimensional inference in time series, providing a practical tool that handles dependence without estimating large covariance matrices. The simulation results for size and power, including slight deviations, and the empirical example with VIX support its utility in applied settings. This could influence methods for variable screening in dependent high-dimensional data.

major comments (2)
  1. [Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.
  2. [Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.
minor comments (3)
  1. [Abstract] The phrasing 'ln (p) = o(n^a)' contains unnecessary spaces; consider standard mathematical notation ln(p) = o(n^a).
  2. [Introduction] A brief comparison with existing max-tests or bootstrap methods for high-dimensional data (e.g., references to works on Gaussian approximations or other resampling schemes) would help situate the contribution.
  3. [Empirical Example] More details on the data preprocessing and choice of block length in the VIX application would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and indicate the revisions we plan to incorporate.

read point-by-point responses
  1. Referee: [Theorem 3.1 (Bootstrap Consistency)] The result establishing consistency of the wild block bootstrap for the max-statistic is central to the size control. The paper needs to make explicit how the exponent a in the condition ln(p) = o(n^a) is determined by the summability of the physical dependence coefficients (in Wu's sense) and the moment bounds. If the dependence decays only polynomially with a small exponent, the permitted growth for p may be severely restricted, potentially undermining the claim of exponential growth in general weakly dependent settings.

    Authors: We agree that the dependence of the exponent a on the physical dependence coefficients and moment conditions should be stated more explicitly. Theorem 3.1 currently expresses the rate condition in terms of an a > 0 that depends on memory decay and higher-moment growth, but we will add a dedicated remark immediately after the theorem. This remark will derive the admissible range for a from the summability rate of the physical dependence measures (e.g., when the coefficients satisfy a polynomial decay of order r, a is bounded above by a function of r and the moment index). The remark will also note that for very slow polynomial decay the resulting a may be small, thereby restricting the allowable growth of p; this is the expected behavior under the stated assumptions and does not contradict the paper’s claim, which is conditional on the existence of some a > 0. We will revise the manuscript accordingly. revision: yes

  2. Referee: [Section 5 (Numerical Experiments)] The simulations should include scenarios with varying dependence strengths to test the boundary of the rate condition. Currently, if all simulations use strong decay allowing large a, they do not fully validate the size control under the minimal conditions required for the theoretical result.

    Authors: We acknowledge that the existing simulation designs employ dependence structures with relatively rapid decay. To address the referee’s concern, we will expand Section 5 with additional Monte Carlo experiments that vary the decay rate of the physical dependence coefficients, including polynomial decay rates that place the design near the boundary of the admissible a in the condition ln(p) = o(n^a). These new scenarios will report empirical size and power, thereby providing direct numerical support for the theoretical rate restrictions. The revised manuscript will include these results together with a brief discussion of how the chosen decay parameters relate to the conditions of Theorem 3.1. revision: yes

Circularity Check

0 steps flagged

No circularity: bootstrap consistency follows from standard theory under external physical dependence assumptions

full rationale

The derivation applies the multiplier block bootstrap to the max-statistic of marginal regression coefficients under the null, with the limiting distribution approximated via resampling of the physical dependence process. The rate condition ln(p)=o(n^a) is obtained directly from summability of the Wu physical dependence coefficients together with moment bounds; these are stated as primitive assumptions imported from the existing literature on dependent processes rather than fitted or defined in terms of the target result. The paper explicitly sidesteps covariance estimation and selection by using the max-statistic, whose extreme-value limit is handled by the bootstrap without any self-referential equation or reduction of a prediction to a fitted input. No load-bearing step reduces by construction to a self-citation, ansatz smuggled via prior work, or renaming of a known pattern; the central size-control claim therefore rests on independent bootstrap theory for dependent data and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about dependence and moment conditions standard in time series bootstrap literature, with no free parameters or invented entities introduced in the abstract.

axioms (2)
  • domain assumption physical dependence in a time series setting
    Invoked to justify the block bootstrap for weakly dependent and possibly non-stationary data.
  • domain assumption ln(p) = o(n^a) for some a > 0 depending on memory decay and higher moments
    Required for the asymptotic validity of the max-test when p grows exponentially with n.

pith-pipeline@v0.9.0 · 5518 in / 1337 out tokens · 34807 ms · 2026-05-07T05:52:15.915076+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references

  1. [1]

    Andrews, D. W. K. (1988). Laws of large numbers for dependent non-identically distributed random variables. Economet. Theory , 4:458--467

  2. [2]

    Andrews, D. W. K. (1999). Estimation when a parameter is on a boundary. Econometrica , 67:1341--1383

  3. [3]

    Andrews, D. W. K. and Cheng, X. (2012). Estimation and inference with weak, semi-strong and strong identification. Econometrica , 80:2153–2211

  4. [4]

    Andrews, D. W. K. and Cheng, X. (2013). Maximum likelihood estimation and uniform inference with sporadic identification failure. J. Econometrics , 173:36--56

  5. [5]

    Andrews, D. W. K. and Cheng, X. (2014). Gmm estimation and uniform subvector inference with possible identification failure. Economet. Theory , 30:287--333

  6. [6]

    Andrews, D. W. K. and Ploberger, W. (1994). Optimal tests when a nuisance parameter is present only under the alternative. Econometrica , 62:1383--1414

  7. [7]

    and Hahn, J

    Angrist, J. and Hahn, J. (2004). When to control for covariates? panel asymptotics for estimates of treatment effects. Review Econom. Statist. , 86:58--72

  8. [8]

    A., and Mikosch, T

    Basrak, B., Davis, R. A., and Mikosch, T. (2002). Regular variation of garch processes. Stoch. Process. Appl. , 99:95--115

  9. [9]

    Belloni, A., Chernozhukov, V., and Hansen, C. (2014). High-dimensional methods and inference on structural and treatment effects. J. Econom. Perspect. , 28:29--50

  10. [10]

    and Hochberg, Y

    Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B , 57:289–300

  11. [11]

    D., Buja, A., Zhang, K., and Zhao, L

    Berk, R., Brown, L. D., Buja, A., Zhang, K., and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. , 41:802--837

  12. [12]

    Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics , 31:307--327

  13. [13]

    Bose, A. (1988). Edgewordth correction by bootstrap in autoregressions. Ann. Statist. , 16:1709--1722

  14. [14]

    and Picard, N

    Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. Annals of Probability , 20:1714--1730

  15. [15]

    and van de Geer, S

    Buhlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data . Springer, Berlin

  16. [16]

    Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. , 39:1496--1525

  17. [17]

    Cai, T. T. and Jiang, T. (2012). Phase transition in limiting distributions of coherence of high-dimensional random matrices. J. Multivariate Anal. , 107:24–39

  18. [18]

    Chang, J., Chen, X., and Wu, M. (2024). Central limit theorems for high dimensional dependent data. Bernoulli , 30:712--742

  19. [19]

    Chang, J., Jiang, Q., and Shao, X. (2023). Testing the martingale difference hypothesis in high dimension. J. Econometrics , 235:972--1000

  20. [20]

    Chang, J., Tang, C., and Wu, Y. (2013). Marginal empirical likelihood and sure independence feature screening. Ann. Statist. , 41:2123--2148

  21. [21]

    and Kato, K

    Chen, X. and Kato, K. (2019). Randomized incomplete u-statistics in high dimensions. Ann. Statist. , 47:3127--3156

  22. [22]

    Cheng, X. (2015). Robust inference in nonlinear models with mixed indentification strength. J. Econometrics , 189:207--228

  23. [23]

    Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. , 41:2786--2819

  24. [24]

    Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of G aussian random vectors. Probab. Theory Related Fields , 162:47--70

  25. [25]

    Correia, S. (2016). A feasible estimator for linear models with multi-way fixed effects. Unpublished manuscipt (scorreia.com/research/hdfe.pdf)

  26. [26]

    P., and Boldrick, J

    Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. , 18:71--103

  27. [27]

    Efron, B. (2006). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. , 99:96–104

  28. [28]

    and Li, R

    Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Sanz-Sole, M., Soria, J., Varona, J. L., and Verdera, J., editors, Proc. Internat. Congress of Mathematicians , volume III, pages 595--622, Zurich. European Mathematical Society

  29. [29]

    and Lv, J

    Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B , 70:849--911

  30. [30]

    Fan, J., Lv, J., and Qi, . L. (2011). Sparse high-dimensional models in economics. Annual Econ. Rev. , 3:291--317

  31. [31]

    Gallant, A. R. and White, H. (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models . Basil Blackwell, New York

  32. [32]

    R., Jin, J., Wasserman, L., and Yao, Z

    Genovese, C. R., Jin, J., Wasserman, L., and Yao, Z. (3023). A comparison of the lasso and marginal regression. J. Mach. Learning Research , 13:2107--2143

  33. [33]

    Giannone, D., Lenza, M., and Primiceri, G. E. (2021). Economic predictions with big data: The illusion of sparsity. Technical Report 2542, European Central Bank

  34. [34]

    and Zinn, J

    Gine, E. and Zinn, J. (1990). Bootstrapping general empirical measures. Ann. Probab. , 18:851--869

  35. [35]

    Hansen, B. E. (1996). Inference when a nuisance parameter is not identified under the null hypothesis. Econometrica , 64:413--430

  36. [36]

    Hill, J. B. (2021). Weak identification robust wild bootstrap applied to a consistent model specification test. Economet. Theory , 37:409--463

  37. [37]

    Hill, J. B. (2025a). Mixingale and physical dependence equality with applications. Statist. Probab. Letters , 221:110380

  38. [38]

    Hill, J. B. (2025b). Testing many zero restrictions in a high dimensional linear regression setting. J. Bus. Econom. Statist. , 43:55--67

  39. [39]

    Hill, J. B. (2026a). Supplemental material for ``a high dimensional wild bootstrap max-test for detecting the presence of significant predictors". Dept. of Economics, University of North Carolina - Chapel Hill

  40. [40]

    Hill, J. B. (2026b). Supplemental material for ``max-laws of large numbers for high dimensional arrays with applications''. Dept. of Economics, University of North Carolina - Chapel Hill

  41. [41]

    Hill, J. B. and Li, T. (2025). A bootstrapped test of covariance stationarity based on orthonormal transformations. Bernoulli , 31:1527--1551

  42. [42]

    Hill, J. B. and Motegi, K. (2020). A max-correlation white noise for weakly dependent time series. Economet. Theory , 36:907--960

  43. [43]

    W., and Qian, M

    Huang, T.-J., McKeague, I. W., and Qian, M. (2019). Marginal screening for high-dimensional predictors of survival outcomes. Stat. Sin. , 29:2105--2139

  44. [44]

    Jiang, T. (2004). The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. , 14:865–880

  45. [45]

    Kesten, H. (1973). Random difference equations and renewal theory for products of random matrices. Acta Mathematica , 131:207–248

  46. [46]

    K., and Roeslgaard, S

    Koles \'a r, M., M \"u ller, U. K., and Roeslgaard, S. T. (2024). The fragility of sparsity. Dept. of Economics, Princeton University

  47. [47]

    and Murphy, S

    Laber, E. and Murphy, S. A. (2011). Adaptive confidence intervals for the test error in classification. J. Amer. Statist. Assoc. , 106:904--913

  48. [48]

    and P\" o tscher, B

    Leeb, H. and P\" o tscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators. Ann. Statist. , 34:2554--2591

  49. [49]

    Li, D., Liu, W., and Rosalsky, A. (2009). Necessary and sufficient conditions for the asymptotic distribution of the largest entry of a sample correlation matrix. Probab. Theory Related Fields , 148:5--35

  50. [50]

    Liu, R. Y. (1988). Bootstrap procedures under some non-i.i.d. models. Ann. Statist. , 16:1696--1708

  51. [51]

    Liu, W., Lin, Z., and Shao, Q. (2008). The asymptotic distribution and berry–esseen bound of a new test for independence in high dimension with an application to stochastic optimization. Ann. Appl. Probab. , 18:2337--2366

  52. [52]

    Ljung, G. M. and Box, G. E. P. (1978). On a measure of lack of fit in time series models. Biometrika , 65:297--303

  53. [53]

    McCloskey, A. (2017). Bonferroni-based size-correction for nonstandard testing problems. J. Econometrics , 200:17--35

  54. [54]

    McCloskey, A. (2020). Asymptotically uniform tests after consistent model selection in the linear regression model. J. Bus. Econom. Statist. , 38:810--825

  55. [55]

    McCloskey, A. (2024). Hybrid confidence intervals for informative uniform asymptotic inference after model selection. Biometrika , 111:109--127

  56. [56]

    and Qian, M

    McKeague, I. and Qian, M. (2015). An adaptive resampling test for detecting the presence of significant predictors. J. Amer. Statist. Assoc. , 110:1422--1433

  57. [57]

    and Zhang, I

    McKeague, I. and Zhang, I. (2022). Significance testing for canonical correlation analysis in high dimensions. Biometrika , 109:1076--1083

  58. [58]

    McLeish, D. L. (1975). A maximal inequality and dependent strong laws. Ann. Probab. , 3:829--839

  59. [59]

    Nemirovski, A. S. (2000). Topics in nonparametric statistics. In Letures on Probability Theory and Statistics . Springer, Berlin. Lectures Notes on Mathematics, vol. 1738

  60. [60]

    Rio, E. (2017). Asymptotic Theory of Weakly Dependent Random Processes . Springer

  61. [61]

    Sawa, T. (1978). Iinformation criteria for discriminating among alternative regression models. Econometrica , 46:1273--1291

  62. [62]

    and Zhou, W.-X

    Shao, Q.-M. and Zhou, W.-X. (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. , 42:623--648

  63. [63]

    Shao, X. (2011). A bootstrap-assisted spectral test of white noise under unknown dependence. J. Econometrics , 162:213--224

  64. [64]

    J., , and Barut, E

    Tang, Y., Wang, H. J., , and Barut, E. (2018). Testing for the presence of significant covariates through conditional marginal regression. Biometrika , 105:57--71

  65. [65]

    Vershynin, R. (2018). High-Dimensional Probability . Cambridge University Press, Cambridge, UK

  66. [66]

    White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica , 50:1--25

  67. [67]

    Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. , 102:14150--14154

  68. [68]

    Wu, W. B. and Min, M. (2005). On linear processes with dependent innovations. Stochastic Process. Appl. , 115:939--958

  69. [69]

    Wu, W. B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Statist. , 10:352--379

  70. [70]

    and Laber, E

    Zhang, Y. and Laber, E. B. (2015). Comment: An adaptive resampling test fordetecting the presence of signicant predictors. J. Amer. Statist. Assoc. , 110:1451--1454