High Dimensional Gaussian and Bootstrap Approximations in Generalized Linear Models

Debraj Das; Mayukh Choudhury

arxiv: 2601.09925 · v2 · submitted 2026-01-14 · 📊 stat.ME

High Dimensional Gaussian and Bootstrap Approximations in Generalized Linear Models

Mayukh Choudhury , Debraj Das This is my paper

Pith reviewed 2026-05-16 13:56 UTC · model grok-4.3

classification 📊 stat.ME

keywords high-dimensional statisticsgeneralized linear modelsbootstrap approximationGaussian approximationLasso penaltyconvex setssparsitystatistical inference

0 comments

The pith

Bootstrap approximations remain valid for the Lasso-penalized GLM estimator over convex sets and Euclidean balls when dimension grows exponentially with sample size under suitable sparsity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that Gaussian approximations to the distribution of GLM estimators hold in moderate high dimensions, specifically over convex sets when d is o(n to the 2/5) and over balls when d is o(n to the 1/2). It then constructs bootstrap procedures that achieve the same rates. In the ultra-high-dimensional regime where d grows exponentially in n, Gaussian approximation breaks down, yet bootstrap methods still deliver valid approximations for the sparse component of the Lasso-penalized estimator when log d is o(n to the 2τ/3) and the number of nonzero coefficients is o(n to the 1/3 minus 4τ/3), with penalty λ_n approximately n to the 1/2 plus τ for τ in (0, 1/4). These results are supported by simulations and real-data examples from medical and survey settings.

Core claim

While the Gaussian approximation to the high-dimensional GLM estimator fails when d grows exponentially with n, the bootstrap approximations over Borel convex sets and Euclidean balls remain valid for the relevant part of the Lasso-penalized estimator under the growth conditions log d = o(n^{2τ/3}) and number of nonzero parameters o(n^{1/3-4τ/3}) with λ_n ∼ n^{1/2+τ} for τ ∈ (0,1/4).

What carries the argument

Lasso-penalized GLM estimator together with residual or data bootstrap procedures that approximate its distribution uniformly over collections of convex sets and Euclidean balls.

If this is right

Reliable confidence regions can be constructed for GLM parameters in high-dimensional medical and survey data without assuming normality.
The same bootstrap procedures apply uniformly to both moderate and ultra-high-dimensional regimes under the respective conditions.
Inference remains valid for the sparse part of the estimator even when the full Gaussian limit does not exist.
Finite-sample performance matches the theoretical rates in simulations across both regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The methods could be tested on other link functions or penalties beyond the Lasso to check robustness.
If the design concentration conditions weaken, one might need stronger resampling variants to restore coverage.
These approximations open the door to valid simultaneous inference on many GLM coefficients when classical central-limit results are unavailable.

Load-bearing premise

The link function is sufficiently smooth and the design matrix satisfies moment and concentration conditions that keep remainder terms small at the stated rates.

What would settle it

A simulation or real dataset in which log d exceeds the allowed growth rate or sparsity violates the bound and the bootstrap coverage for convex sets or balls falls well below the nominal level would refute the validity claim.

read the original abstract

Generalized Linear Model (or GLM) extends the ordinary linear regression by linking the mean of the response variable to covariates through appropriate link functions. GLM is widely used in the analysis of datasets arising from diverse fields including medical sciences, clinical trials, population surveys and risk analysis. In this paper, we investigate the Gaussian and Bootstrap approximations of GLM under two separate high dimensional regimes: (I) when the dimension $d$ grows slower than $n$ and (II) when $d$ grows exponentially with $n$. Under regime (I), we essentially show that the Gaussian approximation holds over the collection of Borel convex sets when $d = o\big(n^{2/5}\big)$ and over the collection of Euclidean balls when $d = o\big(n^{1/2}\big)$. We further devise two high dimensional Bootstrap methods which are valid over the collections of Borel convex sets and Euclidean balls under the same dimension growth rates. Then we move to regime (II) where we invoke sparsity to GLM through Lasso. We show that the high dimensional Gaussian approximation fails under regime (II). However, the Bootstrap approximations over convex sets and Euclidean balls are valid for the relevant part of the GLM estimator provided $\log d = o\big(n^{2\tau/3}\big)$ and the number of non-zero regression parameters is $o\big(n^{1/3- 4\tau/3}\big)$, when the Lasso penalty $\lambda_n \sim n^{1/2 + \tau}$, for some $\tau \in (0, 1/4)$. Simulation studies confirm the strong finite-sample performance of our proposed Bootstrap methods under both regime (I) and (II). We also implement our methods on real datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Explicit rates for Gaussian and bootstrap approximations in high-dim GLMs, with bootstrap holding in the sparse exponential-d regime where Gaussian fails.

read the letter

The main thing to know is that this paper supplies concrete dimension thresholds for when Gaussian approximation works for GLM estimators and shows that bootstrap methods remain valid in the sparse high-dimensional regime where Gaussian breaks down. Under the slower-growth regime, they get d = o(n^{2/5}) for convex sets and d = o(n^{1/2}) for balls, plus two bootstrap constructions that match those rates. In the exponential-d regime they invoke Lasso, demonstrate Gaussian failure, and recover bootstrap validity when log d = o(n^{2τ/3}) and the number of nonzeros is o(n^{1/3-4τ/3}) for λ_n ~ n^{1/2+τ} with τ in (0,1/4). These rates and the regime-II contrast look like the actual new pieces relative to prior high-dimensional bootstrap work. The paper does a reasonable job laying out practical bootstrap procedures and backing them with simulations plus real-data examples, which matters for fields like medical statistics that already use GLMs. The assumptions on link smoothness and design concentration are standard and internally consistent with the linearization arguments, but they could be spelled out more explicitly for common cases such as logistic or Poisson regression so readers can judge applicability. Simulations are cited as supportive, yet the abstract gives little detail on how close to the rate boundaries they tested or on coverage under misspecification. This is aimed at researchers working on high-dimensional inference for non-linear models. A reader who needs usable bootstrap thresholds for GLM uncertainty quantification will find the explicit conditions useful. The work is coherent on its own terms and deserves a serious referee to check the derivations and finite-sample behavior.

Referee Report

2 major / 2 minor

Summary. The paper studies Gaussian and bootstrap approximations to the MLE (and its Lasso-regularized version) in generalized linear models. In the moderate high-dimensional regime (d = o(n)), it proves that the Gaussian approximation to the centered and scaled estimator is valid uniformly over Borel convex sets when d = o(n^{2/5}) and over Euclidean balls when d = o(n^{1/2}); corresponding bootstrap procedures are shown to be valid at the same rates. In the ultra-high-dimensional regime, the plain Gaussian approximation fails, but bootstrap approximations remain valid over the same classes of sets for the relevant (de-biased) component of the Lasso estimator, provided log d = o(n^{2τ/3}) and the sparsity level s = o(n^{1/3-4τ/3}) when the penalty satisfies λ_n ∼ n^{1/2+τ} for τ ∈ (0,1/4). The results are illustrated by simulations and real-data examples.

Significance. If the stated rates and bootstrap validity hold, the work supplies theoretically justified inference procedures for GLMs precisely where the Gaussian approximation breaks down because of the Lasso bias term or dimensionality. The explicit separation between the two regimes and the demonstration that bootstrap can capture the penalty-induced bias under the given sparsity conditions are the main contributions; they directly address a practical need in high-dimensional medical and risk-analysis applications.

major comments (2)

[§4] §4 (Regime II): the statement that the Gaussian approximation fails while the bootstrap succeeds is central, yet the precise mechanism by which the bootstrap absorbs the λ_n-induced bias term is not quantified in the abstract or the high-level description; an explicit comparison of the remainder terms (e.g., the order of the bias that the bootstrap captures versus the Gaussian remainder) is needed to substantiate the claim.
[Assumptions preceding Theorem 3.1] Assumption set for Regime I (link-function smoothness and design concentration): the rates d = o(n^{2/5}) and d = o(n^{1/2}) are derived under C^3 smoothness of the link and uniform restricted-eigenvalue-type conditions; these assumptions must be stated with explicit constants or moment bounds, because they directly determine whether the claimed dimension thresholds are attainable for common link functions such as logit or log.

minor comments (2)

[Simulation studies] The simulation section should report the exact values of n, d, s, and the chosen link functions together with the empirical coverage probabilities for both convex-set and ball approximations, so that the finite-sample behavior can be compared directly with the theoretical thresholds.
[Notation] Notation for the two regimes should be unified; the symbol τ appears only in Regime II while the moderate-dimensional rates are written with explicit powers of n, which makes cross-referencing slightly cumbersome.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive recommendation, and constructive comments on our manuscript. We address each major comment below and have revised the paper accordingly to improve clarity and substantiation.

read point-by-point responses

Referee: [§4] §4 (Regime II): the statement that the Gaussian approximation fails while the bootstrap succeeds is central, yet the precise mechanism by which the bootstrap absorbs the λ_n-induced bias term is not quantified in the abstract or the high-level description; an explicit comparison of the remainder terms (e.g., the order of the bias that the bootstrap captures versus the Gaussian remainder) is needed to substantiate the claim.

Authors: We agree that an explicit comparison strengthens the central claim. In the revised manuscript we have added a new paragraph in Section 4.3 that quantifies the orders: the Gaussian approximation to the de-biased Lasso estimator carries an extra bias remainder of order λ_n √s (which fails to vanish under the ultra-high-dimensional regime), whereas the bootstrap distribution, by resampling the penalized objective, automatically incorporates this term and yields a valid approximation whose remainder is o_p(1) under the stated conditions log d = o(n^{2τ/3}) and s = o(n^{1/3-4τ/3}). We also updated the abstract and the high-level summary in the introduction to reference this order comparison. revision: yes
Referee: [Assumptions preceding Theorem 3.1] Assumption set for Regime I (link-function smoothness and design concentration): the rates d = o(n^{2/5}) and d = o(n^{1/2}) are derived under C^3 smoothness of the link and uniform restricted-eigenvalue-type conditions; these assumptions must be stated with explicit constants or moment bounds, because they directly determine whether the claimed dimension thresholds are attainable for common link functions such as logit or log.

Authors: We thank the referee for this observation. The original assumptions already impose C^3 smoothness with bounded third derivative and uniform restricted-eigenvalue conditions on the design, but we have revised the statement preceding Theorem 3.1 to include explicit constants: the third derivative of the link is bounded by M (with M = 1/4 for the logit link and M = 1 for the log link) and the design satisfies E[|X_{ij}|^4] ≤ C_0 together with a uniform lower bound on the restricted eigenvalues. These explicit bounds confirm that the stated dimension thresholds are attainable for the logit and log links under standard sub-Gaussian designs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rely on external concentration and bootstrap theory

full rationale

The paper establishes Gaussian and bootstrap approximation rates for GLMs in two regimes using standard linearization of the score function, moment bounds, and uniform restricted eigenvalue conditions on the design. The stated dimension and sparsity thresholds (d = o(n^{2/5}), log d = o(n^{2τ/3}), s = o(n^{1/3-4τ/3})) follow directly from controlling remainder terms under C^3 link smoothness and sub-exponential tails, without any self-definitional closure, fitted-parameter renaming, or load-bearing self-citation. The failure of plain Gaussian approximation under Lasso is shown by explicit bias terms that the bootstrap captures, all derived from the same external inequalities rather than from quantities defined inside the paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; full list of assumptions on link functions, design matrices, and moment conditions is unavailable, so the ledger records only the domain assumptions implied by the stated rates.

axioms (1)

domain assumption The GLM link function is sufficiently smooth and the covariates satisfy concentration inequalities that support the high-dimensional remainder bounds
Invoked to justify the Gaussian and bootstrap validity thresholds in both regimes

pith-pipeline@v0.9.0 · 5613 in / 1316 out tokens · 31266 ms · 2026-05-16T13:56:40.155841+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

A., GISH, K., YBARRA, S., & MACK, D

ALON, U., BARKAI, N., NOTTERMAN, D. A., GISH, K., YBARRA, S., & MACK, D. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues.Proceedings of the National Academy of Sciences,96(12), 6745– 6750

work page 1999
[2]

BALL, K. (1993). The reverse isoperimetric problem for Gaussian measure.Discrete & Computational Geometry,10(4), 411–420

work page 1993
[3]

BENTKUS, V. (1986). Dependence of the Berry–Esseen estimate on the dimension.Lithua- nian Mathematical Journal,26(2), 110–114

work page 1986
[4]

BENTKUS, V. (2003). On the dependence of the Berry–Esseen bound on dimension.Jour- nal of Statistical Planning and Inference,113(2), 385–402

work page 2003
[5]

BENTKUS, V. (2005). A Lyapunov-type bound inR d.Theory of Probability & Its Appli- cations,49(2), 311–323

work page 2005
[6]

BERKSON, J. (1944). Application of the logistic function to bio-assay.Journal of the American Statistical Association,39(227), 357–365

work page 1944
[7]

BHATTACHARYA, R. N. & RAO, R. R. (1986).Normal Approximation and Asymptotic Expansions. vol.64SIAM

work page 1986
[8]

BONIS, T. (2020). Stein’s method for normal approximation in Wasserstein distances. Probability Theory and Related Fields,178(3), 827–860

work page 2020
[9]

&VAN DEGEER, S

B ¨UHLMANN, P. &VAN DEGEER, S. (2011).Statistics for High-Dimensional Data: Meth- ods, Theory and Applications. Springer

work page 2011
[10]

(2008).Honest variable selection in linear and logistic regression models via l1 andl 1+l2 penalization.Electronic Journal of Statistics.2, 1153–1194

BUNEA, F. (2008).Honest variable selection in linear and logistic regression models via l1 andl 1+l2 penalization.Electronic Journal of Statistics.2, 1153–1194

work page 2008
[11]

CHERNOZHUKOV, V., CHETVERIKOV, D., & KATO, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums.The Annals of Statistics,41(6), 2786– 2819

work page 2013
[12]

CHERNOZHUKOV, V., CHETVERIKOV, D., & KATO, K. (2017). Central limit theorems and bootstrap in high dimensions.The Annals of Probability,45(4), 2309–2352

work page 2017
[13]

CHERNOZHUKOV, V., CHETVERIKOV, D., KATO, K., & KOIKE, Y. (2022). Improved central limit theorem and bootstrap approximations in high dimensions.The Annals of Statistics,50(5), 2562–2586

work page 2022
[14]

COX, D. R. (1958). The regression analysis of binary sequences.Journal of the Royal Statistical Society: Series B,20(2), 215–232. HIGH DIMENSIONAL GAUSSIAN AND BOOTSTRAP APPROXIMATIONS IN GLM 69

work page 1958
[15]

DAS, D., & LAHIRI, S. N. (2019). Distributional consistency of the Lasso by perturbation bootstrap.Biometrika,106(4), 957–964

work page 2019
[16]

ELDAN, R., MIKULINCER, D., & ZHAI, A. (2020). The CLT in high dimensions: quan- titative bounds.The Annals of Probability,48(5), 2494–2524

work page 2020
[17]

FAN, J., & LI, R. (2001). Variable selection via nonconcave penalized likelihood.JASA, 96(456), 1348–1360

work page 2001
[18]

FAN, J., & LV, J. (2008). Sure independence screening.JRSS-B,70(5), 849–911

work page 2008
[19]

FAN, J., & PENG, H. (2004). Nonconcave penalized likelihood with diverging parameters. Annals of Statistics,32(3), 928–961

work page 2004
[20]

FANG, X., & KOIKE, Y. (2021). High-dimensional CLTs by Stein’s method.The Annals of Applied Probability,31(4), 1660–1686

work page 2021
[21]

FANG, X., & KOIKE, Y. (2022). New error bounds in multivariate normal approximations via exchangeable pairs.The Annals of Applied Probability,32(1), 602–631

work page 2022
[22]

FANG, X., & KOIKE, Y. (2024). Large-dimensional central limit theorem with fourth- moment error bounds.The Annals of Applied Probability,34(2), 2065–2106

work page 2024
[23]

HE, X., & SHAO, Q.-M. (2000). On parameters of increasing dimensions.Journal of Multivariate Analysis,73(1), 120–135

work page 2000
[24]

HUANG, J., SABRI, M. M. S., ULRIKH, D. V., AHMAD, M., & ALSAFFAR, K. A. M. (2022). Predicting the compressive strength of the cement–fly ash–slag ternary con- crete using the firefly algorithm (FA) and random forest (RF) hybrid machine-learning method.Materials,15(12), 4193

work page 2022
[25]

HUBER, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo.The Annals of Statistics, 799–821

work page 1973
[26]

JIN, Z., YING, Z., & WEI, L. (2001). A simple resampling method by perturbing the minimand.Biometrika,88(2), 381–390

work page 2001
[27]

KANE, D. M. (2011). The Gaussian surface area and noise sensitivity of polynomial threshold functions.Computational Complexity,20(2), 389–412

work page 2011
[28]

R., O’DONNELL, R., & SERVEDIO, R

KLIVANS, A. R., O’DONNELL, R., & SERVEDIO, R. A. (2008). Learning geometric concepts via Gaussian surface area. InFOCS 2008, 541–550

work page 2008
[29]

KNIGHT, K., & FU, W. (2000). Asymptotics for Lasso-type estimators.The Annals of Statistics,28(5), 1356–1378

work page 2000
[30]

High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^

KUCHIBHOTLA, A. K., & RINALDO, A. (2020). High-dimensional CLT for sums of non-degenerate random vectors.arXiv preprint arXiv:2009.13673

work page arXiv 2020
[31]

LAHIRI, S. N. (2021). Necessary and sufficient conditions for Lasso VSC.The Annals of Statistics,49(2), 820–844

work page 2021
[32]

LIANG, Y., CAO, C.-X., & ZHAO, H. (2013). Sparse logistic regression with anL 1/2 penalty for gene selection in cancer classification.BMC Bioinformatics,14:198

work page 2013
[33]

LOPES, M. E. (2022). Central limit theorem and bootstrap approximation in high dimen- sions.The Annals of Statistics,50(5), 2492–2513

work page 2022
[34]

MAMMEN, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap.The Annals of Statistics, 382–400

work page 1989
[35]

MEINSHAUSEN, N., & B ¨UHLMANN, P. (2006). High-dimensional graphs and variable selection.Annals of Statistics,34(3), 1436–1462

work page 2006
[36]

NAGAEV, S. V. (1976). An estimate of the remainder term in the multidimensional central limit theorem. InProc. Third Japan–USSR Symposium on Probability Theory, 419–438. Springer

work page 1976
[37]

NAZAROV, F. (2003). On the maximal perimeter of a convex set inR n. InGeometric Aspects of Functional Analysis, 167–189. 70 MAYUKH CHOUDHURY AND DEBRAJ DAS

work page 2003
[38]

A., & WEDDERBURN, R

NELDER, J. A., & WEDDERBURN, R. W. M. (1972). Generalized linear models.Journal of the Royal Statistical Society: Series A,135(3), 370–384

work page 1972
[39]

NG, T. L. & NEWTON, M. A. (2022).Random weighting in Lasso regression.Electronic Journal of Statistics.16(1), 3430–3481

work page 2022
[40]

PORTNOY, S. (1984). Asymptotic behavior of M-estimators ofpregression parameters whenp 2/nis large. I. Consistency.Annals of Statistics,12(4), 1298–1309

work page 1984
[41]

PORTNOY, S. (1985). Asymptotic behavior of M-estimators ofpregression parameters whenp 2/nis large. II. Normal approximation.The Annals of Statistics,13(4), 1403– 1417

work page 1985
[42]

PORTNOY, S. (1988). Asymptotic behavior of likelihood methods for exponential families. Annals of Statistics,16, 356–366

work page 1988
[43]

RAI ˇC, M. (2019a). A multivariate Berry–Esseen theorem with explicit constants. Bernoulli,25(4A), 1–30

work page
[44]

RAI ˇC, M. (2019b). A multivariate CLT for Lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268

work page internal anchor Pith review Pith/arXiv arXiv
[45]

SAZONOV, V. V. (1972). On a bound for the rate of convergence in the multidimensional CLT. InProc. Sixth Berkeley Symp., vol.6, 563–582

work page 1972
[46]

TIBSHIRANI, R. (1996). Regression shrinkage and selection via the Lasso.JRSS-B,58(1), 267–288

work page 1996
[47]

TIBSHIRANI, R. J. (2013). The lasso problem and uniqueness.Electronic Journal of Statistics7, 1456–1490

work page 2013
[48]

(2018).High-Dimensional Probability: An Introduction with Applica- tions in Data Science

VERSHYNIN, R. (2018).High-Dimensional Probability: An Introduction with Applica- tions in Data Science. Cambridge University Press

work page 2018
[49]

WAINWRIGHT, M. J. (2009). Sharp thresholds for sparsity recovery.IEEE Trans. Info. Theory,55(5), 2183–2202

work page 2009
[50]

WAINWRIGHT, M. J. (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press

work page 2019
[51]

WELSH, A. H. (1989). On M-processes and M-estimation.The Annals of Statistics, 337– 361

work page 1989
[52]

YEH, I.-C. (1998). Modeling of strength of high-performance concrete using artificial neural networks.Cement and Concrete Research,28(12), 1797–1808

work page 1998
[53]

J., & MARONNA, R

YOHAI, V. J., & MARONNA, R. A. (1979). Asymptotic behavior of M-estimators for the linear model.The Annals of Statistics, 258–268

work page 1979
[54]

ZHAI, A. (2018). A high-dimensional CLT inW 2 distance.Probability Theory and Re- lated Fields,170, 821–845

work page 2018
[55]

ZHAO, P., & YU, B. (2006). On model selection consistency of the Lasso.JMLR,7, 2541–2563

work page 2006
[56]

ZHILOVA, M. (2020). Nonclassical Berry–Esseen inequalities.The Annals of Statistics, 48(4), 1922–1939. DEPARTMENT OFMATHEMATICS, INDIANINSTITUTE OFTECHNOLOGYBOMBAY, MUMBAI400076, INDIA Email address:214090002@iitb.ac.in DEPARTMENT OFMATHEMATICS, INDIANINSTITUTE OFTECHNOLOGYBOMBAY, MUMBAI400076, INDIA Email address:debrajdas@math.iitb.ac.in

work page 2020

[1] [1]

A., GISH, K., YBARRA, S., & MACK, D

ALON, U., BARKAI, N., NOTTERMAN, D. A., GISH, K., YBARRA, S., & MACK, D. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues.Proceedings of the National Academy of Sciences,96(12), 6745– 6750

work page 1999

[2] [2]

BALL, K. (1993). The reverse isoperimetric problem for Gaussian measure.Discrete & Computational Geometry,10(4), 411–420

work page 1993

[3] [3]

BENTKUS, V. (1986). Dependence of the Berry–Esseen estimate on the dimension.Lithua- nian Mathematical Journal,26(2), 110–114

work page 1986

[4] [4]

BENTKUS, V. (2003). On the dependence of the Berry–Esseen bound on dimension.Jour- nal of Statistical Planning and Inference,113(2), 385–402

work page 2003

[5] [5]

BENTKUS, V. (2005). A Lyapunov-type bound inR d.Theory of Probability & Its Appli- cations,49(2), 311–323

work page 2005

[6] [6]

BERKSON, J. (1944). Application of the logistic function to bio-assay.Journal of the American Statistical Association,39(227), 357–365

work page 1944

[7] [7]

BHATTACHARYA, R. N. & RAO, R. R. (1986).Normal Approximation and Asymptotic Expansions. vol.64SIAM

work page 1986

[8] [8]

BONIS, T. (2020). Stein’s method for normal approximation in Wasserstein distances. Probability Theory and Related Fields,178(3), 827–860

work page 2020

[9] [9]

&VAN DEGEER, S

B ¨UHLMANN, P. &VAN DEGEER, S. (2011).Statistics for High-Dimensional Data: Meth- ods, Theory and Applications. Springer

work page 2011

[10] [10]

(2008).Honest variable selection in linear and logistic regression models via l1 andl 1+l2 penalization.Electronic Journal of Statistics.2, 1153–1194

BUNEA, F. (2008).Honest variable selection in linear and logistic regression models via l1 andl 1+l2 penalization.Electronic Journal of Statistics.2, 1153–1194

work page 2008

[11] [11]

CHERNOZHUKOV, V., CHETVERIKOV, D., & KATO, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums.The Annals of Statistics,41(6), 2786– 2819

work page 2013

[12] [12]

CHERNOZHUKOV, V., CHETVERIKOV, D., & KATO, K. (2017). Central limit theorems and bootstrap in high dimensions.The Annals of Probability,45(4), 2309–2352

work page 2017

[13] [13]

CHERNOZHUKOV, V., CHETVERIKOV, D., KATO, K., & KOIKE, Y. (2022). Improved central limit theorem and bootstrap approximations in high dimensions.The Annals of Statistics,50(5), 2562–2586

work page 2022

[14] [14]

COX, D. R. (1958). The regression analysis of binary sequences.Journal of the Royal Statistical Society: Series B,20(2), 215–232. HIGH DIMENSIONAL GAUSSIAN AND BOOTSTRAP APPROXIMATIONS IN GLM 69

work page 1958

[15] [15]

DAS, D., & LAHIRI, S. N. (2019). Distributional consistency of the Lasso by perturbation bootstrap.Biometrika,106(4), 957–964

work page 2019

[16] [16]

ELDAN, R., MIKULINCER, D., & ZHAI, A. (2020). The CLT in high dimensions: quan- titative bounds.The Annals of Probability,48(5), 2494–2524

work page 2020

[17] [17]

FAN, J., & LI, R. (2001). Variable selection via nonconcave penalized likelihood.JASA, 96(456), 1348–1360

work page 2001

[18] [18]

FAN, J., & LV, J. (2008). Sure independence screening.JRSS-B,70(5), 849–911

work page 2008

[19] [19]

FAN, J., & PENG, H. (2004). Nonconcave penalized likelihood with diverging parameters. Annals of Statistics,32(3), 928–961

work page 2004

[20] [20]

FANG, X., & KOIKE, Y. (2021). High-dimensional CLTs by Stein’s method.The Annals of Applied Probability,31(4), 1660–1686

work page 2021

[21] [21]

FANG, X., & KOIKE, Y. (2022). New error bounds in multivariate normal approximations via exchangeable pairs.The Annals of Applied Probability,32(1), 602–631

work page 2022

[22] [22]

FANG, X., & KOIKE, Y. (2024). Large-dimensional central limit theorem with fourth- moment error bounds.The Annals of Applied Probability,34(2), 2065–2106

work page 2024

[23] [23]

HE, X., & SHAO, Q.-M. (2000). On parameters of increasing dimensions.Journal of Multivariate Analysis,73(1), 120–135

work page 2000

[24] [24]

HUANG, J., SABRI, M. M. S., ULRIKH, D. V., AHMAD, M., & ALSAFFAR, K. A. M. (2022). Predicting the compressive strength of the cement–fly ash–slag ternary con- crete using the firefly algorithm (FA) and random forest (RF) hybrid machine-learning method.Materials,15(12), 4193

work page 2022

[25] [25]

HUBER, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo.The Annals of Statistics, 799–821

work page 1973

[26] [26]

JIN, Z., YING, Z., & WEI, L. (2001). A simple resampling method by perturbing the minimand.Biometrika,88(2), 381–390

work page 2001

[27] [27]

KANE, D. M. (2011). The Gaussian surface area and noise sensitivity of polynomial threshold functions.Computational Complexity,20(2), 389–412

work page 2011

[28] [28]

R., O’DONNELL, R., & SERVEDIO, R

KLIVANS, A. R., O’DONNELL, R., & SERVEDIO, R. A. (2008). Learning geometric concepts via Gaussian surface area. InFOCS 2008, 541–550

work page 2008

[29] [29]

KNIGHT, K., & FU, W. (2000). Asymptotics for Lasso-type estimators.The Annals of Statistics,28(5), 1356–1378

work page 2000

[30] [30]

High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^

KUCHIBHOTLA, A. K., & RINALDO, A. (2020). High-dimensional CLT for sums of non-degenerate random vectors.arXiv preprint arXiv:2009.13673

work page arXiv 2020

[31] [31]

LAHIRI, S. N. (2021). Necessary and sufficient conditions for Lasso VSC.The Annals of Statistics,49(2), 820–844

work page 2021

[32] [32]

LIANG, Y., CAO, C.-X., & ZHAO, H. (2013). Sparse logistic regression with anL 1/2 penalty for gene selection in cancer classification.BMC Bioinformatics,14:198

work page 2013

[33] [33]

LOPES, M. E. (2022). Central limit theorem and bootstrap approximation in high dimen- sions.The Annals of Statistics,50(5), 2492–2513

work page 2022

[34] [34]

MAMMEN, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap.The Annals of Statistics, 382–400

work page 1989

[35] [35]

MEINSHAUSEN, N., & B ¨UHLMANN, P. (2006). High-dimensional graphs and variable selection.Annals of Statistics,34(3), 1436–1462

work page 2006

[36] [36]

NAGAEV, S. V. (1976). An estimate of the remainder term in the multidimensional central limit theorem. InProc. Third Japan–USSR Symposium on Probability Theory, 419–438. Springer

work page 1976

[37] [37]

NAZAROV, F. (2003). On the maximal perimeter of a convex set inR n. InGeometric Aspects of Functional Analysis, 167–189. 70 MAYUKH CHOUDHURY AND DEBRAJ DAS

work page 2003

[38] [38]

A., & WEDDERBURN, R

NELDER, J. A., & WEDDERBURN, R. W. M. (1972). Generalized linear models.Journal of the Royal Statistical Society: Series A,135(3), 370–384

work page 1972

[39] [39]

NG, T. L. & NEWTON, M. A. (2022).Random weighting in Lasso regression.Electronic Journal of Statistics.16(1), 3430–3481

work page 2022

[40] [40]

PORTNOY, S. (1984). Asymptotic behavior of M-estimators ofpregression parameters whenp 2/nis large. I. Consistency.Annals of Statistics,12(4), 1298–1309

work page 1984

[41] [41]

PORTNOY, S. (1985). Asymptotic behavior of M-estimators ofpregression parameters whenp 2/nis large. II. Normal approximation.The Annals of Statistics,13(4), 1403– 1417

work page 1985

[42] [42]

PORTNOY, S. (1988). Asymptotic behavior of likelihood methods for exponential families. Annals of Statistics,16, 356–366

work page 1988

[43] [43]

RAI ˇC, M. (2019a). A multivariate Berry–Esseen theorem with explicit constants. Bernoulli,25(4A), 1–30

work page

[44] [44]

RAI ˇC, M. (2019b). A multivariate CLT for Lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

SAZONOV, V. V. (1972). On a bound for the rate of convergence in the multidimensional CLT. InProc. Sixth Berkeley Symp., vol.6, 563–582

work page 1972

[46] [46]

TIBSHIRANI, R. (1996). Regression shrinkage and selection via the Lasso.JRSS-B,58(1), 267–288

work page 1996

[47] [47]

TIBSHIRANI, R. J. (2013). The lasso problem and uniqueness.Electronic Journal of Statistics7, 1456–1490

work page 2013

[48] [48]

(2018).High-Dimensional Probability: An Introduction with Applica- tions in Data Science

VERSHYNIN, R. (2018).High-Dimensional Probability: An Introduction with Applica- tions in Data Science. Cambridge University Press

work page 2018

[49] [49]

WAINWRIGHT, M. J. (2009). Sharp thresholds for sparsity recovery.IEEE Trans. Info. Theory,55(5), 2183–2202

work page 2009

[50] [50]

WAINWRIGHT, M. J. (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press

work page 2019

[51] [51]

WELSH, A. H. (1989). On M-processes and M-estimation.The Annals of Statistics, 337– 361

work page 1989

[52] [52]

YEH, I.-C. (1998). Modeling of strength of high-performance concrete using artificial neural networks.Cement and Concrete Research,28(12), 1797–1808

work page 1998

[53] [53]

J., & MARONNA, R

YOHAI, V. J., & MARONNA, R. A. (1979). Asymptotic behavior of M-estimators for the linear model.The Annals of Statistics, 258–268

work page 1979

[54] [54]

ZHAI, A. (2018). A high-dimensional CLT inW 2 distance.Probability Theory and Re- lated Fields,170, 821–845

work page 2018

[55] [55]

ZHAO, P., & YU, B. (2006). On model selection consistency of the Lasso.JMLR,7, 2541–2563

work page 2006

[56] [56]

ZHILOVA, M. (2020). Nonclassical Berry–Esseen inequalities.The Annals of Statistics, 48(4), 1922–1939. DEPARTMENT OFMATHEMATICS, INDIANINSTITUTE OFTECHNOLOGYBOMBAY, MUMBAI400076, INDIA Email address:214090002@iitb.ac.in DEPARTMENT OFMATHEMATICS, INDIANINSTITUTE OFTECHNOLOGYBOMBAY, MUMBAI400076, INDIA Email address:debrajdas@math.iitb.ac.in

work page 2020