A Goodness-of-Fit Test for Independent Component Models in High Dimensions

Miles E. Lopes; Mingshuo Liu; Siyao Wang

arxiv: 2605.20099 · v1 · pith:VIKCPNE2new · submitted 2026-05-19 · 🧮 math.ST · stat.ME· stat.TH

A Goodness-of-Fit Test for Independent Component Models in High Dimensions

Mingshuo Liu , Siyao Wang , Miles E. Lopes This is my paper

Pith reviewed 2026-05-20 03:35 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH

keywords goodness of fitindependent component modelshigh-dimensional statisticsasymptotic theorypre-whitening avoidancemultivariate data analysis

0 comments

The pith

A goodness-of-fit test for independent component models remains valid when data dimension and sample size grow proportionally.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a goodness-of-fit test for independent component models that carries a theoretical validity guarantee in high-dimensional regimes where the number of dimensions and observations increase at the same rate. The key innovation is that the test avoids a pre-whitening step, which typically causes problems for other tests in such settings. A reader might care because independent component models are commonly used to analyze multivariate data in fields like signal processing and machine learning, and having a reliable way to check model fit is crucial for accurate interpretation. The authors support their claims with numerical experiments showing good size and power, plus applications to gene-expression data.

Core claim

We develop the first goodness-of-fit test for IC models that is supported by a theoretical validity guarantee when the data dimension and sample size diverge proportionally. This is made possible by the fact that the test does not rely on a pre-whitening step, which often limits the applicability of other goodness-of-fit tests in high dimensions.

What carries the argument

A test statistic for assessing fit to an independent component model that is constructed without a preliminary whitening transformation, allowing asymptotic analysis under proportional high-dimensional growth.

If this is right

The test maintains correct size and has power to detect deviations in simulations across various conditions.
It can be applied to gene-expression data for diagnostic purposes in practice.
The absence of pre-whitening extends the range of scenarios where IC model validation is feasible.
Theoretical results establish the limiting distribution under the null when p and n diverge proportionally.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If this test performs well, it could lead to more robust use of ICA in high-dimensional data analysis pipelines.
Similar strategies might be adapted for goodness-of-fit in other models that suffer from preprocessing issues in high dimensions.
Future work could investigate the test's behavior under specific violations of the IC assumptions.

Load-bearing premise

The independent components and the mixing matrix obey regularity conditions that make the test statistic's limiting distribution valid as dimension and sample size grow in proportion.

What would settle it

Observe whether the test statistic converges to the predicted limiting distribution in a high-dimensional dataset generated from an IC model that violates the regularity conditions on the components.

Figures

Figures reproduced from arXiv: 2605.20099 by Miles E. Lopes, Mingshuo Liu, Siyao Wang.

**Figure 1.** Figure 1: Median p-values versus gene-subset size d ∈ {10, 20, . . . , 300} for four tissues: Testis, Colon–Sigmoid, Stomach, Pancreas. Acknowledgements We are grateful to Xin Bing, Derek Latremouille, David Matteson, Klaus Nordhausen and Lida Wang for helpful correspondence. References Anglada-Girotto, M., S. Miravet-Verde, L. Serrano, and S. A. Head (2022). robustica: customizable robust independent component anal… view at source ↗

read the original abstract

Independent component (IC) models are a standard tool for representing multivariate data in statistics, signal processing, and machine learning. Despite the extensive use of IC models, much less attention has been given to goodness-of-fit tests for assessing their compatibility with data. We develop the first goodness-of-fit test for IC models that is supported by a theoretical validity guarantee when the data dimension and sample size diverge proportionally. This is made possible by the fact that the test does not rely on a pre-whitening step, which often limits the applicability of other goodness-of-fit tests in high dimensions. Our theoretical analysis is complemented with numerical experiments that demonstrate the test's size control and power under a range of conditions. In addition, we provide examples involving gene-expression data to illustrate that the test has potential for effective diagnostic use in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a GOF test for IC models with asymptotic validity in the p/n to gamma regime that skips pre-whitening, but the moment and tail conditions on components are not spelled out in the abstract.

read the letter

The main contribution is a goodness-of-fit test for independent component models that has a limiting null distribution when dimension and sample size grow proportionally, without needing a pre-whitening step. This removes a practical barrier that has restricted earlier tests in high-dimensional settings like genomics or signal processing. The work backs the claim with theoretical analysis and reports simulations that check size control and power across conditions, plus a gene-expression example to show diagnostic potential. That combination of theory and illustration is useful for people who actually fit these models to data. The approach appears to engage directly with the proportional asymptotics problem rather than sidestepping it. One soft spot is the regularity conditions. The abstract leaves unspecified the exact moment or tail requirements on the independent components and any spectral conditions on the mixing matrix. If the proof uses only second moments or assumes sub-exponential tails, the result could fail for the heavier-tailed distributions common in the cited applications. The simulations are described as covering a range, but without details on the tail behaviors tested it is difficult to judge how far the guarantee extends. This paper is for statisticians and machine learning researchers who need diagnostics for latent variable models in high dimensions. A reader focused on ICA or similar factor models would get a concrete tool and some evidence it works in practice. The central argument looks internally consistent from what is shown, with no obvious circularity. I would send it to peer review so referees can check the full derivations and experimental coverage.

Referee Report

1 major / 3 minor

Summary. The manuscript develops a goodness-of-fit test for independent component models X = A S (with S having independent entries) that possesses a limiting null distribution when both dimension p and sample size n tend to infinity with p/n → γ. The construction deliberately avoids a pre-whitening step. Theoretical analysis establishing the limiting distribution is supplemented by Monte Carlo experiments that examine size and power, together with illustrations on gene-expression data.

Significance. If the central asymptotic result holds under appropriately stated conditions, the work supplies the first GOF procedure for IC models with a rigorous validity guarantee in the proportional high-dimensional regime. The avoidance of pre-whitening removes a common practical obstacle and could therefore increase the diagnostic utility of IC models in genomics and signal processing. The combination of theory, controlled simulations, and real-data examples constitutes a coherent contribution.

major comments (1)

[§3, Theorem 3.1] §3, Theorem 3.1 (or the main limiting-distribution result): the derivation of the null limiting distribution implicitly relies on moment or tail conditions on the entries of S and spectral conditions on A that are never stated explicitly. If only second-moment assumptions are used, the CLT or concentration step underlying the limit can fail for the heavy-tailed distributions typical of the gene-expression examples in §6; the manuscript must list explicit regularity conditions (e.g., finite fourth moments or sub-exponential tails) and verify that they are compatible with the cited applications.

minor comments (3)

[Abstract] The abstract claims the test is 'the first' with a theoretical guarantee; a brief literature sentence acknowledging any related high-dimensional GOF procedures would improve context.
[Simulation section] Simulation tables: empirical rejection rates under the null are reported without standard errors; adding Monte Carlo standard errors would clarify whether observed size deviations are statistically meaningful.
[§2] Notation: the symbol for the test statistic is introduced without an explicit definition equation; adding a displayed equation at first use would aid readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and the constructive comment on the regularity conditions. We address the point below and will revise the manuscript to strengthen the presentation of the theoretical results.

read point-by-point responses

Referee: [§3, Theorem 3.1] §3, Theorem 3.1 (or the main limiting-distribution result): the derivation of the null limiting distribution implicitly relies on moment or tail conditions on the entries of S and spectral conditions on A that are never stated explicitly. If only second-moment assumptions are used, the CLT or concentration step underlying the limit can fail for the heavy-tailed distributions typical of the gene-expression examples in §6; the manuscript must list explicit regularity conditions (e.g., finite fourth moments or sub-exponential tails) and verify that they are compatible with the cited applications.

Authors: We agree that the moment and tail conditions should be stated explicitly. The proof of Theorem 3.1 relies on the existence of finite fourth moments of the entries of S (to justify the central limit theorem for the relevant quadratic forms) together with sub-exponential tail bounds that guarantee the necessary concentration inequalities; the mixing matrix A is assumed to have eigenvalues bounded away from zero and infinity uniformly in the high-dimensional regime. In the revised manuscript we will insert a new paragraph immediately preceding Theorem 3.1 that lists these conditions in full. For the gene-expression illustrations in §6 we will add a short verification paragraph that reports the sample fourth moments and kurtosis values computed from the data sets; these quantities remain finite and are consistent with the stated assumptions, although we will also note that the conditions are sufficient rather than necessary and that the test may retain practical utility even under moderate departures from sub-exponential tails. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained from first principles

full rationale

The paper presents a new goodness-of-fit test for IC models with a claimed asymptotic validity result under p/n → γ without pre-whitening. No quoted equations, self-citations, or fitted parameters in the abstract or context reduce the limiting distribution or test statistic to an input quantity by construction. The central claim rests on a theoretical analysis that is independent of the target result, with numerical experiments provided as separate validation. This is the common case of an honest methodological derivation in high-dimensional statistics; unstated regularity conditions affect applicability but do not constitute circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

pith-pipeline@v0.9.0 · 5672 in / 1126 out tokens · 37794 ms · 2026-05-20T03:35:18.508674+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

var(X₁ᵀ A X₁) = 2∥Σ^{1/2} A Σ^{1/2}∥_F² + (E(Z_{11}⁴)−3) Σ_j (eⱼᵀ Σ^{1/2} A Σ^{1/2} eⱼ)² (Lemma 20); constraint (3); U-statistic h in (17); Thm 1 under Assump. 1 (8+δ moments, p/n→γ, r(Σ−I)=o(√p))
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

entire development (pre-whitening avoidance, shrinkage, high-dim CLT for fourth-degree polynomials)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

+ 2∥Σ∥2 F 2var(∥X1∥2

work page
[2]

· Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj .(12) Regarding the second factor on the right side, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj = 1 +o P(1).(13) The proof is completed by Lemma 2, which shows that √n Un −var(∥X 1∥2

work page
[3]

Lemma 1.If Assumption 1 holds, then asn→ ∞, pX j=1 ˆΣ2 jj −Σ 2 jj =O P(1)and Pp j=1 ˆΣ2 jjPp j=1 Σ2 jj P − →1

+ 2∥Σ∥2 F 2var(∥X1∥2 2) L − →N(0,1) (14) 18 asn→ ∞. Lemma 1.If Assumption 1 holds, then asn→ ∞, pX j=1 ˆΣ2 jj −Σ 2 jj =O P(1)and Pp j=1 ˆΣ2 jjPp j=1 Σ2 jj P − →1. In addition, the same statements hold when ˆΣis replaced by ˜Σor ˇΣ. Proof.Consider the algebraic identity pX j=1 ˆΣ2 jj −Σ 2 jj =∥Diag( ˆΣ)−Diag(Σ)∥ 2 F + 2 tr Diag(Σ)[Diag(ˆΣ)−Diag(Σ)] .(15) B...

work page 2019
[4]

+ 2∥Σ∥2 F 2var(∥X1∥2 2) L − →N(0,1), Proof.Define the kernel function h(X1, X2) = 1 2 − 2(n−2) n2 ∥X1∥2 2 − ∥X2∥2 2 2 − 2(n−2) n (X ⊤ 1 X2)2.(17) so thatU n can be represented in the following form as a U statistic, Un = 1n/2 2 X 1≤i<j≤n/2 h(Xi, Xj).(18) Next, define the function h1(X1) =E(h(X 1, X2)|X1)−E(h(X 1, X2)),(19) 20 and letL n denote the H´ ajek...

work page 2000
[5]

Using Lemma 20, direct calculations give E(Un) =E(h(X 1, X2)) = 1 2 − 2(n−2) n2 ·2·var(∥X 1∥2 2)− 2(n−2) n E (X ⊤ 1 X2)2 = (var(∥X1∥2 2)−2∥Σ∥ 2 F ) +O(n −1)(var(∥X1∥2

+o P(1).(23) The limit (22) is established in Lemma 5, and so it remains to prove (23). Using Lemma 20, direct calculations give E(Un) =E(h(X 1, X2)) = 1 2 − 2(n−2) n2 ·2·var(∥X 1∥2 2)− 2(n−2) n E (X ⊤ 1 X2)2 = (var(∥X1∥2 2)−2∥Σ∥ 2 F ) +O(n −1)(var(∥X1∥2

work page
[6]

(24) Furthermore, Lemma 3 implies that var(Un) (4/n)var2(∥X1∥2

+∥Σ∥ 2 F ). (24) Furthermore, Lemma 3 implies that var(Un) (4/n)var2(∥X1∥2

work page
[7]

Combining (24) and (25), as well as the fact that∥Σ∥ 2 F ≲var(∥X 1∥2

→1,(25) asn→ ∞. Combining (24) and (25), as well as the fact that∥Σ∥ 2 F ≲var(∥X 1∥2

work page
[8]

Lemma 3.Let the statisticsU n andL n be as defined in(11)and(20)respectively

under Assumption 1, it follows that (23) holds. Lemma 3.Let the statisticsU n andL n be as defined in(11)and(20)respectively. If Assumption 1 holds, then asn→ ∞, var(Ln) var(Un) →1,(26) 21 and var(Ln) (4/n)var2(∥X1∥2

work page
[9]

Thus, the limit (26) will follow if we can show var(h(X1, X2)) =o(nvar(h 1(X1))).(28) We will establish this by showing var(h(X 1, X2))≲∥Σ∥ 4 F and var(h1(X1))≍ ∥Σ∥ 4 F below

→1.(27) Proof.It is a classical fact (van der Vaart, 2000, p.163) that the variances ofU n andL n can be expressed as var(Un) = 2(n/2−2) (n/2 2 ) var(h1(X1)) + 1 (n/2 2 )var(h(X1, X2)) var(Ln) = 4 n/2var(h1(X1)), where we recall thathandh 1 are defined in (17) and (19) respectively. Thus, the limit (26) will follow if we can show var(h(X1, X2)) =o(nvar(h ...

work page 2000
[10]

It follows that var(h1(X1)) = var E h(X1, X2)|X1 = ( 1 4 +O( 1 n))var ∥X1∥2 2 −tr(Σ) 2 + (4 +O( 1 n))var(X⊤ 1 ΣX1) −(2 +O( 1 n))cov ∥X1∥2 2 −tr(Σ) 2 , X⊤ 1 ΣX1

with tr(Σ) 2 to write the second factor in parentheses as (∥X 1∥2 2 −tr(Σ)) 2 when calculating var(h1(X1)). It follows that var(h1(X1)) = var E h(X1, X2)|X1 = ( 1 4 +O( 1 n))var ∥X1∥2 2 −tr(Σ) 2 + (4 +O( 1 n))var(X⊤ 1 ΣX1) −(2 +O( 1 n))cov ∥X1∥2 2 −tr(Σ) 2 , X⊤ 1 ΣX1 . (30) 22 Lemma 4 shows that var ∥X1∥2 2 −tr(Σ) 2 = (2 +o(1))var 2(∥X1∥2 2) ≍ ∥Σ∥ 4 F , (...

work page
[11]

Meanwhile, Lemma 20 and Assumption 1 imply var(X⊤ 1 ΣX1)≲tr(Σ 4) =o(∥Σ∥ 4 F ),(32) and so the term var(X⊤ 1 ΣX1) is negligible in (30)

= 1 in Assumption 1. Meanwhile, Lemma 20 and Assumption 1 imply var(X⊤ 1 ΣX1)≲tr(Σ 4) =o(∥Σ∥ 4 F ),(32) and so the term var(X⊤ 1 ΣX1) is negligible in (30). Likewise, the Cauchy-Schwarz inequality implies that the covariance term in (30) is also negligible, and hence var(h1(X1)) = ( 1 2 +o(1))var 2(∥X1∥2 2)≍ ∥Σ∥ 4 F .(33) This verifies (28) and completes ...

work page 2000
[12]

By Lemma 20 and the conditions in Assumption 1, we have var(∥X1∥2 2)≍ ∥Σ∥ 2 F (40) and var(∥X1∥2 2)−2∥Σ∥ 2 F ≲∥Σ∥ 2 F 25 which imply √n var(∥X1∥2 2)−2∥Σ∥ 2 F var(∥X1∥2

· Pp j=1(Σ2 jj − ˇΣ2 jj)Pp j=1 Σ2 jj · Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj . By Lemma 20 and the conditions in Assumption 1, we have var(∥X1∥2 2)≍ ∥Σ∥ 2 F (40) and var(∥X1∥2 2)−2∥Σ∥ 2 F ≲∥Σ∥ 2 F 25 which imply √n var(∥X1∥2 2)−2∥Σ∥ 2 F var(∥X1∥2

work page
[13]

Next, since we have Pp j=1 Σ2 jj ≳punder Assumption 1, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj =O P(1) and Pp j=1 Σ2 jj − ˇΣ2 jjPp j=1 Σ2 jj =O P( 1 p)

=O( √n). Next, since we have Pp j=1 Σ2 jj ≳punder Assumption 1, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj =O P(1) and Pp j=1 Σ2 jj − ˇΣ2 jjPp j=1 Σ2 jj =O P( 1 p). Altogether, we conclude that ϵn,1 σn =O P( √n p ) =o P(1), as needed. Lemma 8.If Assumption 1 holds, then asn→ ∞ ϵn,2 σn =o P(1). Proof.Recall that ϵn,2 = 1 ∥Σ1/2∥4 4 ˜E(∥X1∥4 4)−...

work page
[14]

=o(p 3/2).(42) Proof.We begin with some notation and preliminary observations. LetAandBdenote thep×pmatrices that satisfy Σ =I+Aand Σ 1/2 =I+B.(43) The eigenvalues ofBcan be expressed asλ j(B) = p 1 +λ j(A)−1, and so the bound 26 |√1 +x−1| ≤ |x|for allx≥ −1 implies |λj(B)| ≤ |λ j(A)|(44) for allj= 1, . . . , p. Next, letB j ∈R p denote thejth column ofBso...

work page
[15]

We now proceed to bound each of the termst 0,

= pX j=1 X4 1j −E(X 4 1j) L2 ≲ 4X k=0 pX j=1 Z(4−k) 1j (B⊤ j Z1)k −E Z(4−k) 1j (B⊤ j Z1)k L2 =: 4X k=0 tk. We now proceed to bound each of the termst 0, . . . , t4 in the last sum. The quantityt 0 is simply the standard deviation of a sum of centered i.i.d. random variables and so t0 = q var(Z4 11)p≲ √p. Next, to boundt 1, we have t1 ≲ pX j=1 Z3 1j(B⊤ j Z...

work page 2018
[16]

Combining the last several steps shows thatP j̸=k E(˜Σ4 jk) =o(n 1/2)

implies ∥˜Σjk −Σ jk∥L4 = 1 n/2 P i> n 2 XijXik −Σ jk L4 ≲max n−1/2 q var(X1jX1k), n−1 nX i> n 2 ∥XijXik −Σ jk∥4 L4 1/4 ≲n −1/2, where we have used∥X 1j∥L8 ≲1, which can be established using an argument similar to (45). Combining the last several steps shows thatP j̸=k E(˜Σ4 jk) =o(n 1/2). Applying this to (51) completes the proof. Lemma 13.If Assumption 1...

work page 2000
[17]

Indeed, since ˆ var(∥X1∥4

=o P(p2). Indeed, since ˆ var(∥X1∥4

work page
[18]

To this end, the unbiasedness of ˆ var(∥X1∥4

is a non-negative random variable, the negligibility will follow if we can check thatE( ˆ var(∥X 1∥4 4)) =o(p 2). To this end, the unbiasedness of ˆ var(∥X1∥4

work page
[19]

and Lemma 9 implyE( ˆ var(∥X1∥4 4)) = var(∥X1∥4

work page
[20]

It remains to show that the first term on the right side of (58) converges to 1 in probability

=o(p 3/2). It remains to show that the first term on the right side of (58) converges to 1 in probability. From the definition ofσ 2 n in (9) this term may be expressed as 4( ˆ var(∥X1∥2 2))2 nσ2 n(Pp j=1 ˆΣ2 jj)2 = ˆ var(∥X1∥2 2) var(∥X1∥2 2) 2 Pp j=1 Σ2 jj Pp j=1 ˆΣ2 jj 2 . Lemma 1 shows that the second factor on the right is 1 +o P(1), while Lemma S3 i...

work page 2019
[21]

Meanwhile, if we takeA=B= Σ 1/2eke⊤ k Σ1/2 withe k ∈R p denoting thek th standard basis vector, then summing overk= 1,

= 2∥Σ∥2 F + (E(Z4 11)−3) pX j=1 Σ2 jj . Meanwhile, if we takeA=B= Σ 1/2eke⊤ k Σ1/2 withe k ∈R p denoting thek th standard basis vector, then summing overk= 1, . . . , pgives pX k=1 E(X4 1k)−3Σ 2 kk = E(Z4 11)−3 pX k=1 pX j=1 (Σ1/2)4 jk. Finally, eliminatingE(Z 4 11)−3 from the previous two equations leads to the stated result. Lemma 21(Bai and Silverstein...

work page 2010

[1] [1]

+ 2∥Σ∥2 F 2var(∥X1∥2

work page

[2] [2]

· Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj .(12) Regarding the second factor on the right side, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj = 1 +o P(1).(13) The proof is completed by Lemma 2, which shows that √n Un −var(∥X 1∥2

work page

[3] [3]

Lemma 1.If Assumption 1 holds, then asn→ ∞, pX j=1 ˆΣ2 jj −Σ 2 jj =O P(1)and Pp j=1 ˆΣ2 jjPp j=1 Σ2 jj P − →1

+ 2∥Σ∥2 F 2var(∥X1∥2 2) L − →N(0,1) (14) 18 asn→ ∞. Lemma 1.If Assumption 1 holds, then asn→ ∞, pX j=1 ˆΣ2 jj −Σ 2 jj =O P(1)and Pp j=1 ˆΣ2 jjPp j=1 Σ2 jj P − →1. In addition, the same statements hold when ˆΣis replaced by ˜Σor ˇΣ. Proof.Consider the algebraic identity pX j=1 ˆΣ2 jj −Σ 2 jj =∥Diag( ˆΣ)−Diag(Σ)∥ 2 F + 2 tr Diag(Σ)[Diag(ˆΣ)−Diag(Σ)] .(15) B...

work page 2019

[4] [4]

+ 2∥Σ∥2 F 2var(∥X1∥2 2) L − →N(0,1), Proof.Define the kernel function h(X1, X2) = 1 2 − 2(n−2) n2 ∥X1∥2 2 − ∥X2∥2 2 2 − 2(n−2) n (X ⊤ 1 X2)2.(17) so thatU n can be represented in the following form as a U statistic, Un = 1n/2 2 X 1≤i<j≤n/2 h(Xi, Xj).(18) Next, define the function h1(X1) =E(h(X 1, X2)|X1)−E(h(X 1, X2)),(19) 20 and letL n denote the H´ ajek...

work page 2000

[5] [5]

Using Lemma 20, direct calculations give E(Un) =E(h(X 1, X2)) = 1 2 − 2(n−2) n2 ·2·var(∥X 1∥2 2)− 2(n−2) n E (X ⊤ 1 X2)2 = (var(∥X1∥2 2)−2∥Σ∥ 2 F ) +O(n −1)(var(∥X1∥2

+o P(1).(23) The limit (22) is established in Lemma 5, and so it remains to prove (23). Using Lemma 20, direct calculations give E(Un) =E(h(X 1, X2)) = 1 2 − 2(n−2) n2 ·2·var(∥X 1∥2 2)− 2(n−2) n E (X ⊤ 1 X2)2 = (var(∥X1∥2 2)−2∥Σ∥ 2 F ) +O(n −1)(var(∥X1∥2

work page

[6] [6]

(24) Furthermore, Lemma 3 implies that var(Un) (4/n)var2(∥X1∥2

+∥Σ∥ 2 F ). (24) Furthermore, Lemma 3 implies that var(Un) (4/n)var2(∥X1∥2

work page

[7] [7]

Combining (24) and (25), as well as the fact that∥Σ∥ 2 F ≲var(∥X 1∥2

→1,(25) asn→ ∞. Combining (24) and (25), as well as the fact that∥Σ∥ 2 F ≲var(∥X 1∥2

work page

[8] [8]

Lemma 3.Let the statisticsU n andL n be as defined in(11)and(20)respectively

under Assumption 1, it follows that (23) holds. Lemma 3.Let the statisticsU n andL n be as defined in(11)and(20)respectively. If Assumption 1 holds, then asn→ ∞, var(Ln) var(Un) →1,(26) 21 and var(Ln) (4/n)var2(∥X1∥2

work page

[9] [9]

Thus, the limit (26) will follow if we can show var(h(X1, X2)) =o(nvar(h 1(X1))).(28) We will establish this by showing var(h(X 1, X2))≲∥Σ∥ 4 F and var(h1(X1))≍ ∥Σ∥ 4 F below

→1.(27) Proof.It is a classical fact (van der Vaart, 2000, p.163) that the variances ofU n andL n can be expressed as var(Un) = 2(n/2−2) (n/2 2 ) var(h1(X1)) + 1 (n/2 2 )var(h(X1, X2)) var(Ln) = 4 n/2var(h1(X1)), where we recall thathandh 1 are defined in (17) and (19) respectively. Thus, the limit (26) will follow if we can show var(h(X1, X2)) =o(nvar(h ...

work page 2000

[10] [10]

It follows that var(h1(X1)) = var E h(X1, X2)|X1 = ( 1 4 +O( 1 n))var ∥X1∥2 2 −tr(Σ) 2 + (4 +O( 1 n))var(X⊤ 1 ΣX1) −(2 +O( 1 n))cov ∥X1∥2 2 −tr(Σ) 2 , X⊤ 1 ΣX1

with tr(Σ) 2 to write the second factor in parentheses as (∥X 1∥2 2 −tr(Σ)) 2 when calculating var(h1(X1)). It follows that var(h1(X1)) = var E h(X1, X2)|X1 = ( 1 4 +O( 1 n))var ∥X1∥2 2 −tr(Σ) 2 + (4 +O( 1 n))var(X⊤ 1 ΣX1) −(2 +O( 1 n))cov ∥X1∥2 2 −tr(Σ) 2 , X⊤ 1 ΣX1 . (30) 22 Lemma 4 shows that var ∥X1∥2 2 −tr(Σ) 2 = (2 +o(1))var 2(∥X1∥2 2) ≍ ∥Σ∥ 4 F , (...

work page

[11] [11]

Meanwhile, Lemma 20 and Assumption 1 imply var(X⊤ 1 ΣX1)≲tr(Σ 4) =o(∥Σ∥ 4 F ),(32) and so the term var(X⊤ 1 ΣX1) is negligible in (30)

= 1 in Assumption 1. Meanwhile, Lemma 20 and Assumption 1 imply var(X⊤ 1 ΣX1)≲tr(Σ 4) =o(∥Σ∥ 4 F ),(32) and so the term var(X⊤ 1 ΣX1) is negligible in (30). Likewise, the Cauchy-Schwarz inequality implies that the covariance term in (30) is also negligible, and hence var(h1(X1)) = ( 1 2 +o(1))var 2(∥X1∥2 2)≍ ∥Σ∥ 4 F .(33) This verifies (28) and completes ...

work page 2000

[12] [12]

By Lemma 20 and the conditions in Assumption 1, we have var(∥X1∥2 2)≍ ∥Σ∥ 2 F (40) and var(∥X1∥2 2)−2∥Σ∥ 2 F ≲∥Σ∥ 2 F 25 which imply √n var(∥X1∥2 2)−2∥Σ∥ 2 F var(∥X1∥2

· Pp j=1(Σ2 jj − ˇΣ2 jj)Pp j=1 Σ2 jj · Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj . By Lemma 20 and the conditions in Assumption 1, we have var(∥X1∥2 2)≍ ∥Σ∥ 2 F (40) and var(∥X1∥2 2)−2∥Σ∥ 2 F ≲∥Σ∥ 2 F 25 which imply √n var(∥X1∥2 2)−2∥Σ∥ 2 F var(∥X1∥2

work page

[13] [13]

Next, since we have Pp j=1 Σ2 jj ≳punder Assumption 1, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj =O P(1) and Pp j=1 Σ2 jj − ˇΣ2 jjPp j=1 Σ2 jj =O P( 1 p)

=O( √n). Next, since we have Pp j=1 Σ2 jj ≳punder Assumption 1, it follows from Lemma 1 that Pp j=1 Σ2 jj Pp j=1 ˇΣ2 jj =O P(1) and Pp j=1 Σ2 jj − ˇΣ2 jjPp j=1 Σ2 jj =O P( 1 p). Altogether, we conclude that ϵn,1 σn =O P( √n p ) =o P(1), as needed. Lemma 8.If Assumption 1 holds, then asn→ ∞ ϵn,2 σn =o P(1). Proof.Recall that ϵn,2 = 1 ∥Σ1/2∥4 4 ˜E(∥X1∥4 4)−...

work page

[14] [14]

=o(p 3/2).(42) Proof.We begin with some notation and preliminary observations. LetAandBdenote thep×pmatrices that satisfy Σ =I+Aand Σ 1/2 =I+B.(43) The eigenvalues ofBcan be expressed asλ j(B) = p 1 +λ j(A)−1, and so the bound 26 |√1 +x−1| ≤ |x|for allx≥ −1 implies |λj(B)| ≤ |λ j(A)|(44) for allj= 1, . . . , p. Next, letB j ∈R p denote thejth column ofBso...

work page

[15] [15]

We now proceed to bound each of the termst 0,

= pX j=1 X4 1j −E(X 4 1j) L2 ≲ 4X k=0 pX j=1 Z(4−k) 1j (B⊤ j Z1)k −E Z(4−k) 1j (B⊤ j Z1)k L2 =: 4X k=0 tk. We now proceed to bound each of the termst 0, . . . , t4 in the last sum. The quantityt 0 is simply the standard deviation of a sum of centered i.i.d. random variables and so t0 = q var(Z4 11)p≲ √p. Next, to boundt 1, we have t1 ≲ pX j=1 Z3 1j(B⊤ j Z...

work page 2018

[16] [16]

Combining the last several steps shows thatP j̸=k E(˜Σ4 jk) =o(n 1/2)

implies ∥˜Σjk −Σ jk∥L4 = 1 n/2 P i> n 2 XijXik −Σ jk L4 ≲max n−1/2 q var(X1jX1k), n−1 nX i> n 2 ∥XijXik −Σ jk∥4 L4 1/4 ≲n −1/2, where we have used∥X 1j∥L8 ≲1, which can be established using an argument similar to (45). Combining the last several steps shows thatP j̸=k E(˜Σ4 jk) =o(n 1/2). Applying this to (51) completes the proof. Lemma 13.If Assumption 1...

work page 2000

[17] [17]

Indeed, since ˆ var(∥X1∥4

=o P(p2). Indeed, since ˆ var(∥X1∥4

work page

[18] [18]

To this end, the unbiasedness of ˆ var(∥X1∥4

is a non-negative random variable, the negligibility will follow if we can check thatE( ˆ var(∥X 1∥4 4)) =o(p 2). To this end, the unbiasedness of ˆ var(∥X1∥4

work page

[19] [19]

and Lemma 9 implyE( ˆ var(∥X1∥4 4)) = var(∥X1∥4

work page

[20] [20]

It remains to show that the first term on the right side of (58) converges to 1 in probability

=o(p 3/2). It remains to show that the first term on the right side of (58) converges to 1 in probability. From the definition ofσ 2 n in (9) this term may be expressed as 4( ˆ var(∥X1∥2 2))2 nσ2 n(Pp j=1 ˆΣ2 jj)2 = ˆ var(∥X1∥2 2) var(∥X1∥2 2) 2 Pp j=1 Σ2 jj Pp j=1 ˆΣ2 jj 2 . Lemma 1 shows that the second factor on the right is 1 +o P(1), while Lemma S3 i...

work page 2019

[21] [21]

Meanwhile, if we takeA=B= Σ 1/2eke⊤ k Σ1/2 withe k ∈R p denoting thek th standard basis vector, then summing overk= 1,

= 2∥Σ∥2 F + (E(Z4 11)−3) pX j=1 Σ2 jj . Meanwhile, if we takeA=B= Σ 1/2eke⊤ k Σ1/2 withe k ∈R p denoting thek th standard basis vector, then summing overk= 1, . . . , pgives pX k=1 E(X4 1k)−3Σ 2 kk = E(Z4 11)−3 pX k=1 pX j=1 (Σ1/2)4 jk. Finally, eliminatingE(Z 4 11)−3 from the previous two equations leads to the stated result. Lemma 21(Bai and Silverstein...

work page 2010