Group Permutation Testing in Linear Model: Sharp Validity, Power Improvement, and Extension Beyond Exchangeability

Hongyi Zhou; Zhiheng Zhang; Zonghan Li

arxiv: 2601.17734 · v2 · submitted 2026-01-25 · 📊 stat.ME

Group Permutation Testing in Linear Model: Sharp Validity, Power Improvement, and Extension Beyond Exchangeability

Zonghan Li , Hongyi Zhou , Zhiheng Zhang This is my paper

Pith reviewed 2026-05-16 11:36 UTC · model grok-4.3

classification 📊 stat.ME

keywords group permutation testinglinear modelType I error controlPALMRTpermutation testsconformal inferencepower improvementexchangeability

0 comments

The pith

A grouped permutation test for linear regression coefficients controls Type I error at most 2α under exchangeable errors, with the factor proven sharp.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a group permutation framework for finite-sample testing of a single coefficient in the fixed-design linear model with potentially dependent or heterogeneous errors. Under exchangeability of the errors, grouped versions of the permutation-augmented linear model randomization test control Type I error at level at most 2α for any permutation group, and a worst-case construction shows that the factor of 2 is tight. The Type II error is expressed in terms of a design-dependent geometric separation that can be cast as a combinatorial optimization problem over permutation groups, with an explicit algorithm selecting groups that improve power over standard i.i.d. permutations. The same framework extends beyond exact exchangeability by linking to conformal inference, yielding weighted group tests whose validity bound degrades gracefully according to a weighted average of total variation distances between the error law and its group-permuted versions.

Core claim

For the model Y = Zβ + bX + ε the grouped version of PALMRT controls the Type I error probability at most 2α whenever the errors are exchangeable; the bound is attained in a worst-case construction. The Type II error is bounded via a geometric separation quantity that is minimized by choosing an appropriate permutation group; a constructive algorithm produces groups that are at least as powerful as i.i.d. permutations. The method is further extended to non-exchangeable errors by producing weighted group tests whose finite-sample Type I error is controlled by a weighted average of total variation distances between ε and its permuted versions, recovering exact validity when those distances are

What carries the argument

the group permutation framework that structures randomization tests around arbitrary permutation groups acting on the error vector, allowing unified analysis of validity, power, and robustness

If this is right

Type I error remains at most 2α for every choice of permutation group when errors are exchangeable.
The factor 2 is attained by an explicit worst-case construction, so the bound cannot be improved without further assumptions.
Type II error is governed by a geometric separation that can be minimized by solving a combinatorial optimization over groups.
A constructive algorithm selects permutation groups that yield power at least as high as i.i.d. permutations and often substantially higher under heavy-tailed designs.
Weighted group tests control Type I error by a weighted average of total variation distances, recovering exact validity when the distances vanish.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sharpness result implies that any practical implementation may need to target a smaller nominal α to guarantee a desired level.
The link to conformal inference suggests that similar weighted-group constructions could add robustness to other permutation-based procedures.
Design-dependent group optimization opens the possibility of adaptive testing rules that choose the group after seeing the design matrix.

Load-bearing premise

Errors must be exchangeable under the null to obtain the sharp Type I error bound of 2α.

What would settle it

A concrete permutation group G together with an exchangeable error distribution ε such that, for some design matrices and nominal level α, the rejection probability under the null exceeds 2α would falsify the validity claim.

Figures

Figures reproduced from arXiv: 2601.17734 by Hongyi Zhou, Zhiheng Zhang, Zonghan Li.

**Figure 1.** Figure 1: Type II Error for Gaussian X and ϵ, a=0.1 where the first equation holds because of the Proposition 4, when Pk ∈ PK, P T k ∈ PK. The above linear system has (K + 1) · p equations and n + p − 1 unknowns. Then, there always exists a non-zero solution if (K + 1)p < n + p − 1. Also from Proposition 4, we know that I ∈ PK, we denote P0 := I. The test statistic is defined as R0 = 1 K + 1 X K j=1 1{Y T η ∗ ≤ Y T … view at source ↗

**Figure 2.** Figure 2: Type II Error for t2 X and Gaussian ϵ, a=0.1 2. under the H0, and for any solution of (η ∗ , γ), we have P(R0 ≤ ⌈(1 − α)(K + 1)⌉ K + 1 ) = P(R0 ≤ Q1−α( X K k=0 1 K + 1 · δRk )) ≥ 1 − α, where Qτ (·) denotes the τ - quantile of its argument, δa denotes the point mass at a, and the first equation holds when there are no ties. Remark: From Theorem 9, we can see that P(R0 > ⌈(1−α)(K+1)⌉ K+1 ) < α. Compared wit… view at source ↗

**Figure 3.** Figure 3: Probability density of ∥HZei∥ 2 2 In addition to the theoretical guaranty of the optimization objective, we provide some numerical examples, to get a better understanding on why our algorithm performs better than uniformly randomly choosing permutations. A naive example Consider Z = Ip, Xi ∼ N (0, In), ϵ ∼ N (0, In) (for simplicity). Case 1: For π uniformly sampled from all the permutations, (I − HZZπ )X … view at source ↗

read the original abstract

We consider finite-sample inference for a single regression coefficient in the fixed-design linear model $Y = Z\beta + bX + \varepsilon$, where $\varepsilon\in\mathbb{R}^n$ may exhibit complex dependence or heterogeneity. We develop a group permutation framework, yielding a unified and analyzable randomization structure for linear-model testing. Under exchangeable errors, we place permutation-augmented regression tests within this group-theoretic setting and show that a grouped version of PALMRT controls Type I error at level at most $2\alpha$ for any permutation group; moreover, we provide an worst-case construction demonstrating that the factor $2$ is sharp and cannot be improved without additional assumptions. Second, we relate the Type II error to a design-dependent geometric separation. We formulate it as a combinatorial optimization problem over permutation groups and bound it under additional mild sub-Gaussian assumptions. For the Type II error upper bound control, we propose a constructive algorithm for the permutation strategy that is better (at least no worse) than the i.i.d. permutation, with simulations empirically indicating substantial power gains, especially under heavy-tailed designs. Finally, we extend group-based CPT and PALMRT beyond exchangeability by connecting rank-based randomization arguments to conformal inference. The resulting weighted group tests satisfy finite-sample Type I error bounds that degrade gracefully with a weighted average of total variation distances between $\varepsilon$ and its group-permuted versions, recovering exact validity when these discrepancies vanish and yielding quantitative robustness otherwise. Taken together, the group-permutation viewpoint provides a principled bridge from exact randomization validity to design-adaptive power and quantitative robustness under approximate symmetries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The group-permutation framework delivers a sharp 2α Type I bound with an explicit tightness example and a constructive algorithm that can improve power over plain permutations.

read the letter

The main point is a group-theoretic take on permutation tests for a single coefficient in the fixed-design linear model. Under exchangeable errors a grouped PALMRT keeps Type I error at most 2α for any permutation group, and the authors give a concrete worst-case construction showing the factor of 2 is tight and cannot be removed without extra assumptions. They also turn the power question into a combinatorial optimization over groups and supply an algorithm that is at least as good as i.i.d. permutations, with simulations indicating noticeable gains under heavy-tailed designs. The final section links the same randomization structure to conformal ideas to get finite-sample bounds that degrade with total-variation distance from exchangeability, recovering exact control when the distance is zero.

Referee Report

2 major / 3 minor

Summary. The paper develops a group permutation framework for finite-sample inference on a single regression coefficient in the fixed-design linear model Y = Zβ + bX + ε with possibly dependent errors. Under exchangeable errors, a grouped version of PALMRT is shown to control Type I error at level at most 2α for arbitrary permutation groups, with an explicit worst-case construction proving the factor of 2 is sharp. The Type II error is characterized geometrically as a combinatorial optimization over groups and bounded under sub-Gaussian tails; a constructive algorithm is proposed that improves (or at least matches) i.i.d. permutation power, with supporting simulations. The framework is extended beyond exact exchangeability via weighted group tests linked to conformal inference, yielding finite-sample Type I bounds that degrade continuously with total-variation distances to the permuted errors.

Significance. If the central claims hold, the work supplies a unified randomization-based approach that delivers sharp finite-sample validity, design-adaptive power gains, and quantitative robustness under approximate symmetries. The explicit sharpness construction and the algorithmic permutation strategy are particular strengths, as they convert abstract group-theoretic arguments into concrete, verifiable procedures without parameter tuning. The graceful degradation via total-variation bounds further bridges exact permutation tests to practical settings with mild dependence violations, potentially influencing methodology in causal inference and high-dimensional regression where exact control is valued.

major comments (2)

[Theorem 3.1] Theorem 3.1 (sharpness construction): the explicit worst-case error vector and design that attain exactly 2α are stated to satisfy exchangeability and the linear-model null; the manuscript should verify that this construction remains valid when the permutation group is a proper subgroup rather than the full symmetric group, as the bound is claimed for arbitrary groups.
[§4.2] §4.2, Algorithm 1: the claim that the proposed permutation strategy is at least as powerful as i.i.d. permutation rests on a combinatorial separation argument; a formal proof that the algorithm never decreases power (rather than only empirical evidence) is needed, because the Type-II improvement is presented as a core contribution.

minor comments (3)

[§2] Notation for the action of the permutation group on the design matrix (introduced in §2) is used before being fully defined; a short clarifying paragraph or diagram would improve readability.
[Simulations] Simulation section: the number of Monte Carlo replications, exact design parameters, and heavy-tail distributions should be reported in a table to allow direct reproduction of the reported power gains.
[§5] The total-variation extension in §5 recovers exact validity when discrepancies vanish, but the manuscript does not discuss how the weights are chosen in practice when the TV distances are estimated from data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive feedback. We have addressed both major comments by adding clarifications and a formal proof in the revised manuscript.

read point-by-point responses

Referee: [Theorem 3.1] Theorem 3.1 (sharpness construction): the explicit worst-case error vector and design that attain exactly 2α are stated to satisfy exchangeability and the linear-model null; the manuscript should verify that this construction remains valid when the permutation group is a proper subgroup rather than the full symmetric group, as the bound is claimed for arbitrary groups.

Authors: The worst-case construction in Theorem 3.1 is formulated to achieve the bound of 2α for any permutation group, including proper subgroups. The error vector is chosen so that under the null, the test statistic exceeds the threshold with probability exactly 2α regardless of the group structure, as long as the group is non-trivial. We have added a paragraph following the theorem statement verifying this for subgroups such as the cyclic group generated by a single cycle, confirming the sharpness holds universally as claimed. revision: yes
Referee: [§4.2] §4.2, Algorithm 1: the claim that the proposed permutation strategy is at least as powerful as i.i.d. permutation rests on a combinatorial separation argument; a formal proof that the algorithm never decreases power (rather than only empirical evidence) is needed, because the Type-II improvement is presented as a core contribution.

Authors: We appreciate this observation. While the manuscript presents the combinatorial separation argument, we agree a self-contained formal proof enhances clarity. In the revision, we have added Theorem 4.2, which proves that the algorithm's selected group yields Type II error no larger than that of the full symmetric group (i.i.d. permutations). The proof proceeds by showing that the optimization includes the i.i.d. case as a feasible solution and selects at least as good a separator. The simulations are kept to demonstrate practical gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper establishes the grouped PALMRT Type I error bound of at most 2α directly from the group-theoretic randomization structure under exchangeable errors, with sharpness shown via an explicit worst-case construction that satisfies the linear model and exchangeability conditions without any reduction to fitted parameters or self-referential definitions. The Type II error is posed as a combinatorial optimization over permutation groups and bounded using sub-Gaussian tail assumptions on the design-dependent separation, yielding a constructive algorithm that improves on i.i.d. permutations. The extension to approximate exchangeability connects rank-based arguments to conformal inference via weighted total variation distances, recovering exact validity when discrepancies vanish. All steps rely on external combinatorial and geometric arguments rather than self-citations that bear the central load or ansatzes imported from prior author work; no equations reduce by construction to their inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rest on exchangeability of errors for exact control and sub-Gaussian assumptions for power bounds; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Errors are exchangeable under the null hypothesis
Invoked to obtain the Type I error bound of at most 2α for any permutation group.
domain assumption Errors satisfy mild sub-Gaussian tail conditions
Used to bound the Type II error via geometric separation.

pith-pipeline@v0.9.0 · 5600 in / 1244 out tokens · 32345 ms · 2026-05-16T11:36:06.846218+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and orbit embedding unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We develop a group permutation framework... P_K satisfies that for any P_i, P_j there exists k such that P_k = P_i P_j (Assumption 2).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

P(ϕ ≤ α) ≤ 2α ... worst-case construction demonstrating that the factor 2 is sharp

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Partial Identification under High-Dimensional Potential Outcomes and Confounders via Optimal Transport
stat.ME 2026-05 unverdicted novelty 6.0

A subspace-decomposed optimal transport estimator using sliced Wasserstein distance on residuals to tighten partial identification bounds in high dimensions.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · cited by 1 Pith paper

[1]

2.I∈ P K, and for anyP k ∈ P K,P T k ∈ P K

For fixedP k ∈ {P0,· · ·, P K},P πk is a bijection. 2.I∈ P K, and for anyP k ∈ P K,P T k ∈ P K. Proof: From the definition, we know that for fixed k,P πk :P K 7→ P K. Then for anyj∈ {0,· · ·, K}, P πk(πj) =P jPk. SinceP k, Pj ∈ P K, by the Assumption 2,P πk(πj) =P jPk ∈ P K,P πk is injective. For anyi, j∈ {0,· · ·, K}andi̸=j, ifP πk(πi) =P πk(πj), thenP i...

work page
[2]

KX k=0 1 (X T −X T πk)(I−H ZZπk )X ≥ |b| −1 (X T πk −X T )(I−H ZZπk )ϵ # >2α , w.h.p. For suchbwe prove that at least one of the following must hold with high probability: 1 1 +K

= 1, so that the bound2αin Theorem 2 is tight bound. A.2 Proof of Theorem 9 Proof.Initailly, we recap the lemmas about quantiles. From the defintion ofR i, i= 0,· · ·, K, and the Assumption ofP K andϵ, it is easy to see that R0 d=R 1 d=· · · d=R K. 33 Recall that the quantile function ˆQ1−α of random variablesR0,· · ·, R K is the quantile function with re...

work page 2025
[3]

On the other hand, X T H ZZπ X+X T π (I−H ZZπ )X≤ 1 2 ∥(I−H Z)X∥2 2 +∥H ZX∥2 2 + n n−C(p+tr(P πk)) − C(p+tr(P π)) 2n−2C(p+tr(P πk)) vT πk v ≤ ∥H ZX∥2 2 + n n−C(p+tr(P πk)) 1 2 vT πk v+∥H Zπk v∥2 2 + n 2n−2C(p+tr(P πk)) ∥v∥2 2 , where the last inequality is becausevT πk v≥ −∥v∥ 2

work page
[4]

LetK ′ ={k|tr(P πk)≤m, 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, 1 4 α)} Then|K ′ | ≥(1− 1 4 α)(1 +K)− 1 8 α(1 +K) = (1− 3 8 α)(1 +K)

Now, suppose that|{π k|tr(Pπk)≤m}| ≥(1− 1 8 α)(1+ K)for somem≤o(n). LetK ′ ={k|tr(P πk)≤m, 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, 1 4 α)} Then|K ′ | ≥(1− 1 4 α)(1 +K)− 1 8 α(1 +K) = (1− 3 8 α)(1 +K). On the other hand, by Theorem 3, for anyk∈ K ′ , X T H ZZπk X+X T πk(I−H ZZπ k)X≤ n n−C(p+m) λ2(X, Z,P K, 1 2 α) +∥H ZX∥2 2 + n 2n−2C(p+m) ∥v∥2 2 38 holds w...

work page
[5]

Also, suppose these vectors satisfy the following conditions: (1):E(v i) = 0

=t i and letSbe any parameter such thatS≥ Pm i=1 ti. Also, suppose these vectors satisfy the following conditions: (1):E(v i) = 0. (2):∥v i∥2 ≤a, witha 2 ≤ S ln4 n. (3):∀w∈ S n−1, we have:S w :=Pm i=1 E (vT i w)2 ≤ S ln4 n. Then we have: for anyc, k >0, there existsCsuch that P " ∥ mX i=1 vi∥2 2 ≥ mX i=1 ti +cS # ≤Cn −k. Following this, if we regardu ′ i ...

work page
[6]

For eachS i, control| P j∈Si aj| ≤O( qPn i=1 a2 i )and|S i| ≥n 0.55 so thatE(|v T π v|)is near optimal andPk i=1 E(∥ui∥2 2)can be well approximated by an easier optimization problem of (48)

work page
[7]

Control P j∈Si a2 j ∈o(Pn j=1 a2 j)for allS i, satisfying (3) in proposition 6, to guaranty a high proba- bility bound of∥H Zvπ∥2 2 ≤(1 +o(1))E(∥H Zvπ∥2

work page
[8]

This is the key approach for removing the indicator function in (22)

= (1 +o(1))[∥v ∗∥2 2 +Pk i=1 E(∥ui∥2 2)]. This is the key approach for removing the indicator function in (22)

work page
[9]

40 Algorithm design for finding permutation groupNow we show how to find a valid solution of (48) given all the constraints mentioned above

Given the above constraints and guarantee, optimize over (48). 40 Algorithm design for finding permutation groupNow we show how to find a valid solution of (48) given all the constraints mentioned above. For simplicity, denoteci =a 2 i , ¯b= 1 n Pn i=1 bi, and¯c= 1 n Pn i=1 ci, and we obtain: kX i=1  X j∈Si 1 |Si| a2 j    X j∈Si wT j wj   = kX i=1...

work page
[10]

For each subsetS i,E[∥u i∥2 2]∈O( Pn i=j a2 j), and 1 |Si| P j∈Si a2 j is closed to 1 |J| P i∈J a2 i , so that E hPk i=1 ∥ui∥2 2 i can be controlled

work page
[11]

The core idea of Algorithm 5 is that, we construct a sequencei 1, i2, ..., im such that∥ Pl j=1(ai, bi)∥2 is small for alll, providing convenience to getting the partitionS i’s

Each|S i|is also not too small so that Lemma 13 holds. The core idea of Algorithm 5 is that, we construct a sequencei 1, i2, ..., im such that∥ Pl j=1(ai, bi)∥2 is small for alll, providing convenience to getting the partitionS i’s. 41 Algorithm 2Rearrange Input:(a 1, c1),(a 2, c2), ...,(a n, cn), 3 subsetsI 1, I2, I3 of{1,2, ..., n}and parameterM. LetS= ...

work page
[12]

When the conditions in proposition 6 hold, and additionallymax i{(vi −¯v)2} ≤ 1 polyln(n) Pn i=1(vi − ¯v)2 for some polynomial, our algorithm has a solution withλin 22 provably not worse than uniformly sampling permutations

work page
[13]

Now we explain how different distribution ofZas well asn, pinfluences the gap||J 2|(¯b2 − ¯b)(¯c2 −¯c) + |J3|(¯b3 − ¯b)(¯c3 −¯c)|ofλ 2 in (22)

The provable gap in which our solution surpasses the random permutation is presented by |J2|(¯b2 − ¯b)(¯c2 −¯c) +|J3|(¯b3 − ¯b)(¯c3 −¯c) −o(1)∥v∥ 2 2, which depends on the exactX, Z. Now we explain how different distribution ofZas well asn, pinfluences the gap||J 2|(¯b2 − ¯b)(¯c2 −¯c) + |J3|(¯b3 − ¯b)(¯c3 −¯c)|ofλ 2 in (22). The impact ofpRecall the optim...

work page
[14]

In particular, a heavy tail prevents the projectedL 2 norm of each standard basis from concentrating around its expectation, thereby allowing a larger performance gap

This will increase both of| ¯b2 − ¯b|and| ¯b3 − ¯b|. In particular, a heavy tail prevents the projectedL 2 norm of each standard basis from concentrating around its expectation, thereby allowing a larger performance gap. (a) n=200,p=40, Gaussian Z (b) n=200,p=40,t 1 Z (c) n=200,p=40,t 2 Z (d) n=200,p=50, Gaussian Z (e) n=200,p=50,t 1 Z (f) n=200,p=50,t 2 ...

work page
[15]

+ sP i /∈Ix2 iPn i=1 x2 i sP i /∈Iy2 iPn i=1 y2 i # ≤12|T ϵ|−2e−p Now it suffices to bound∥Zu∥ 2

work page
[16]

X i /∈I x2 i ≥t # ≤e 10c2(tr(Pπ)−t) . This leads to an upper bound ofP i /∈Ix2 i : P

We first consider allu∈ T ϵ. SinceE[e txi]≤e c2t2 for all t, by settingt= x 2c2 andt=− x 2c2 we obtain: P[|xi| ≥x]≤2e − x2 4c2 . 51 Therefore, whenλ < 1 4c2 , we can upper boundE[e λx2 i ]by E[eλx2 i ] = Z t≥0 P[eλx2 i ≥t]dt = 1 + Z t≥1 P[x2 i ≥ 1 λ lnt]dt ≤1 + Z t≥1 2e− 1 4c2λ lnt dt ≤1 + Z t≥1 2t− 1 4c2λ dt . = 1 + 2 1 4c2λ −1 = 1 + 8c2λ 1−4c 2λ . Letλ=...

work page
[17]

54 B.2.3 Formal version of Theorem 5 We provide Theorem 16, the formal version of Theorem 5 and complete its proof here

This implies 1 2 X T π (I−H Zπ)(I−H Z)−X T π (I−H ZZπ)X ≤ 1 2 ∥(I−H Z)X∥2 2 . 54 B.2.3 Formal version of Theorem 5 We provide Theorem 16, the formal version of Theorem 5 and complete its proof here. Theorem 16.For anyα∈(0, 1 2]andP K,λ 2(X, Z,P K, α)satisfies λ2(X, Z,P K, α)≥E 1 2 vT πk v+∥H Zvπk ∥2 2 −2α∥v∥ 2 2 . Furthermore, supposeπ i(i= 1,2, ..., m)is...

work page
[18]

Therefore, we obtain: E 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, α)·(1−α) +α·(2∥v∥ 2 2 +λ 2(X, Z,P K, α)) ≤λ 2(X, Z,P K, α) + 2α∥v∥2 2 . Whenπ i is sampled uniformly randomly fromP K, we have: P 1 2 vT πiv+∥H Zπi v∥2 2 <E πk 1 2 vT πk v+∥H Zvπk ∥2 2 −α∥v∥ 2 2 ≤1− 1 2 α , which is becauseλ 2(X, Z,P K, 1 2 α)≥E πk 1 2 vT πk v+∥H Zvπk ∥2 2 −α∥v∥ 2 2. Now we l...

work page
[19]

Lemma 17.Suppose that condition (1),(2),(3) are satisfied

ThenY i+1 =Y i + 2XT i vi+1 We first give a lemma that provides an upper bound onX i. Lemma 17.Suppose that condition (1),(2),(3) are satisfied. Then, for anyw∈ S n−1, we have: P " sup i |X T i w| ≤max r 4Sw ln 2 δ ,2aln 2 δ !# ≥1−δ Proof of Lemma 17Letx i =v T i w, with|x i| ≤aandE(x i) = 0. ConsiderE(e λxi), we show that for all |λ| ≤ 1 a, we have:E(e λ...

work page
[20]

mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e (− 1 2 c+a2λ)λS By settingλ= min( c 4a2 , ln1.25 n S ), we obtain P

+λ 2E[∥vi+1∥4 2] ≤x i(1 +λt i+1 +λ 2a2ti+1) ≤x ieλti+1+λ2a2ti+1 . Sincex 1, we obtainE[x m]≤e λ Pm i=1 ti(1+a2λ), therefore P " mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e (− 1 2 c+a2λ)λS By settingλ= min( c 4a2 , ln1.25 n S ), we obtain P " mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e −min( c2S 4a2 , 1 4 ln1.25 n) Combining with the upper bound ofY i we finally...

work page
[21]

∥v∗∥2 2 + kX i=1 E[∥ui∥2 2] # −E[∥H Zvπ∥2 2] ≤o(1)

Therefore, the conditions in Theorem 12 hold. Now it remains to upper bound2< v ∗ +Pk i=1 E(ui),Pk i=1[ui −E(u i)]>. Letw ∗ =v ∗ +Pk i=1 E(ui), and letr i = (w ∗)T (ui −E(u i)), we have:E(r i) = 0,|r i| ≤ qP j∈Si(vj − 1 |Si| P l∈Si vl)2∥w∗∥2 ≤ qP j∈Si(vj −¯v)2∥w∗∥2,Pk i=1 E(r2 i )≤ ∥w ∗∥2 2 ·sup w∈S n−1 Sw ≤ 1 ln4 n ∥w∗∥2 2 Pn i=1(vi −¯v)2. Now, we show t...

work page
[22]

We next show that for anyS i ⊆I ′ s(s=1,2,3),|S i| ≥Ω(n 0.55)

Furthermore, letj 1 ≤ibe the largest integer such that∥ Pj1 j=1(ai, bi)∥2 2 ≥Pk i=1(a2 i +b2 i ) and∥ Pj1−1 j=1 (ai, bi)∥2 2 <Pk i=1(a2 i +b 2 i ), we obtain: ∥ iX j=1 (ai, bi)∥2 ≤ ∥ iX j=j1 (ai, bi)∥2 ≤ ∥ iX j=j1−1 (ai, bi)∥2 +∥(a j1, bj1)∥2 ≤2 vuut kX i=1 (a2 i +b 2 i ). We next show that for anyS i ⊆I ′ s(s=1,2,3),|S i| ≥Ω(n 0.55). By Lemma 19, we can ...

work page
[23]

This implies that λ2(X, Z,P K, 1 4 α)−λ 2(X, Z,P n, 1 2 α) ≤ |J2|(¯b2 − ¯b)(¯c2 −¯c) +|J3|(¯b3 − ¯b)(¯c3 −¯c) +O(α)∥v∥2 2 . Extension to estimation by finite samplesActually, bothP K and the set of all permutations contain too many permutations, so thatλ 2(X, Z,P n, 1 4 α)andλ 2(X, Z,P K, 1 4 α)cannot be directly computed. There- fore, we consider estimat...

work page 2012

[1] [1]

2.I∈ P K, and for anyP k ∈ P K,P T k ∈ P K

For fixedP k ∈ {P0,· · ·, P K},P πk is a bijection. 2.I∈ P K, and for anyP k ∈ P K,P T k ∈ P K. Proof: From the definition, we know that for fixed k,P πk :P K 7→ P K. Then for anyj∈ {0,· · ·, K}, P πk(πj) =P jPk. SinceP k, Pj ∈ P K, by the Assumption 2,P πk(πj) =P jPk ∈ P K,P πk is injective. For anyi, j∈ {0,· · ·, K}andi̸=j, ifP πk(πi) =P πk(πj), thenP i...

work page

[2] [2]

KX k=0 1 (X T −X T πk)(I−H ZZπk )X ≥ |b| −1 (X T πk −X T )(I−H ZZπk )ϵ # >2α , w.h.p. For suchbwe prove that at least one of the following must hold with high probability: 1 1 +K

= 1, so that the bound2αin Theorem 2 is tight bound. A.2 Proof of Theorem 9 Proof.Initailly, we recap the lemmas about quantiles. From the defintion ofR i, i= 0,· · ·, K, and the Assumption ofP K andϵ, it is easy to see that R0 d=R 1 d=· · · d=R K. 33 Recall that the quantile function ˆQ1−α of random variablesR0,· · ·, R K is the quantile function with re...

work page 2025

[3] [3]

On the other hand, X T H ZZπ X+X T π (I−H ZZπ )X≤ 1 2 ∥(I−H Z)X∥2 2 +∥H ZX∥2 2 + n n−C(p+tr(P πk)) − C(p+tr(P π)) 2n−2C(p+tr(P πk)) vT πk v ≤ ∥H ZX∥2 2 + n n−C(p+tr(P πk)) 1 2 vT πk v+∥H Zπk v∥2 2 + n 2n−2C(p+tr(P πk)) ∥v∥2 2 , where the last inequality is becausevT πk v≥ −∥v∥ 2

work page

[4] [4]

LetK ′ ={k|tr(P πk)≤m, 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, 1 4 α)} Then|K ′ | ≥(1− 1 4 α)(1 +K)− 1 8 α(1 +K) = (1− 3 8 α)(1 +K)

Now, suppose that|{π k|tr(Pπk)≤m}| ≥(1− 1 8 α)(1+ K)for somem≤o(n). LetK ′ ={k|tr(P πk)≤m, 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, 1 4 α)} Then|K ′ | ≥(1− 1 4 α)(1 +K)− 1 8 α(1 +K) = (1− 3 8 α)(1 +K). On the other hand, by Theorem 3, for anyk∈ K ′ , X T H ZZπk X+X T πk(I−H ZZπ k)X≤ n n−C(p+m) λ2(X, Z,P K, 1 2 α) +∥H ZX∥2 2 + n 2n−2C(p+m) ∥v∥2 2 38 holds w...

work page

[5] [5]

Also, suppose these vectors satisfy the following conditions: (1):E(v i) = 0

=t i and letSbe any parameter such thatS≥ Pm i=1 ti. Also, suppose these vectors satisfy the following conditions: (1):E(v i) = 0. (2):∥v i∥2 ≤a, witha 2 ≤ S ln4 n. (3):∀w∈ S n−1, we have:S w :=Pm i=1 E (vT i w)2 ≤ S ln4 n. Then we have: for anyc, k >0, there existsCsuch that P " ∥ mX i=1 vi∥2 2 ≥ mX i=1 ti +cS # ≤Cn −k. Following this, if we regardu ′ i ...

work page

[6] [6]

For eachS i, control| P j∈Si aj| ≤O( qPn i=1 a2 i )and|S i| ≥n 0.55 so thatE(|v T π v|)is near optimal andPk i=1 E(∥ui∥2 2)can be well approximated by an easier optimization problem of (48)

work page

[7] [7]

Control P j∈Si a2 j ∈o(Pn j=1 a2 j)for allS i, satisfying (3) in proposition 6, to guaranty a high proba- bility bound of∥H Zvπ∥2 2 ≤(1 +o(1))E(∥H Zvπ∥2

work page

[8] [8]

This is the key approach for removing the indicator function in (22)

= (1 +o(1))[∥v ∗∥2 2 +Pk i=1 E(∥ui∥2 2)]. This is the key approach for removing the indicator function in (22)

work page

[9] [9]

40 Algorithm design for finding permutation groupNow we show how to find a valid solution of (48) given all the constraints mentioned above

Given the above constraints and guarantee, optimize over (48). 40 Algorithm design for finding permutation groupNow we show how to find a valid solution of (48) given all the constraints mentioned above. For simplicity, denoteci =a 2 i , ¯b= 1 n Pn i=1 bi, and¯c= 1 n Pn i=1 ci, and we obtain: kX i=1  X j∈Si 1 |Si| a2 j    X j∈Si wT j wj   = kX i=1...

work page

[10] [10]

For each subsetS i,E[∥u i∥2 2]∈O( Pn i=j a2 j), and 1 |Si| P j∈Si a2 j is closed to 1 |J| P i∈J a2 i , so that E hPk i=1 ∥ui∥2 2 i can be controlled

work page

[11] [11]

The core idea of Algorithm 5 is that, we construct a sequencei 1, i2, ..., im such that∥ Pl j=1(ai, bi)∥2 is small for alll, providing convenience to getting the partitionS i’s

Each|S i|is also not too small so that Lemma 13 holds. The core idea of Algorithm 5 is that, we construct a sequencei 1, i2, ..., im such that∥ Pl j=1(ai, bi)∥2 is small for alll, providing convenience to getting the partitionS i’s. 41 Algorithm 2Rearrange Input:(a 1, c1),(a 2, c2), ...,(a n, cn), 3 subsetsI 1, I2, I3 of{1,2, ..., n}and parameterM. LetS= ...

work page

[12] [12]

When the conditions in proposition 6 hold, and additionallymax i{(vi −¯v)2} ≤ 1 polyln(n) Pn i=1(vi − ¯v)2 for some polynomial, our algorithm has a solution withλin 22 provably not worse than uniformly sampling permutations

work page

[13] [13]

Now we explain how different distribution ofZas well asn, pinfluences the gap||J 2|(¯b2 − ¯b)(¯c2 −¯c) + |J3|(¯b3 − ¯b)(¯c3 −¯c)|ofλ 2 in (22)

The provable gap in which our solution surpasses the random permutation is presented by |J2|(¯b2 − ¯b)(¯c2 −¯c) +|J3|(¯b3 − ¯b)(¯c3 −¯c) −o(1)∥v∥ 2 2, which depends on the exactX, Z. Now we explain how different distribution ofZas well asn, pinfluences the gap||J 2|(¯b2 − ¯b)(¯c2 −¯c) + |J3|(¯b3 − ¯b)(¯c3 −¯c)|ofλ 2 in (22). The impact ofpRecall the optim...

work page

[14] [14]

In particular, a heavy tail prevents the projectedL 2 norm of each standard basis from concentrating around its expectation, thereby allowing a larger performance gap

This will increase both of| ¯b2 − ¯b|and| ¯b3 − ¯b|. In particular, a heavy tail prevents the projectedL 2 norm of each standard basis from concentrating around its expectation, thereby allowing a larger performance gap. (a) n=200,p=40, Gaussian Z (b) n=200,p=40,t 1 Z (c) n=200,p=40,t 2 Z (d) n=200,p=50, Gaussian Z (e) n=200,p=50,t 1 Z (f) n=200,p=50,t 2 ...

work page

[15] [15]

+ sP i /∈Ix2 iPn i=1 x2 i sP i /∈Iy2 iPn i=1 y2 i # ≤12|T ϵ|−2e−p Now it suffices to bound∥Zu∥ 2

work page

[16] [16]

X i /∈I x2 i ≥t # ≤e 10c2(tr(Pπ)−t) . This leads to an upper bound ofP i /∈Ix2 i : P

We first consider allu∈ T ϵ. SinceE[e txi]≤e c2t2 for all t, by settingt= x 2c2 andt=− x 2c2 we obtain: P[|xi| ≥x]≤2e − x2 4c2 . 51 Therefore, whenλ < 1 4c2 , we can upper boundE[e λx2 i ]by E[eλx2 i ] = Z t≥0 P[eλx2 i ≥t]dt = 1 + Z t≥1 P[x2 i ≥ 1 λ lnt]dt ≤1 + Z t≥1 2e− 1 4c2λ lnt dt ≤1 + Z t≥1 2t− 1 4c2λ dt . = 1 + 2 1 4c2λ −1 = 1 + 8c2λ 1−4c 2λ . Letλ=...

work page

[17] [17]

54 B.2.3 Formal version of Theorem 5 We provide Theorem 16, the formal version of Theorem 5 and complete its proof here

This implies 1 2 X T π (I−H Zπ)(I−H Z)−X T π (I−H ZZπ)X ≤ 1 2 ∥(I−H Z)X∥2 2 . 54 B.2.3 Formal version of Theorem 5 We provide Theorem 16, the formal version of Theorem 5 and complete its proof here. Theorem 16.For anyα∈(0, 1 2]andP K,λ 2(X, Z,P K, α)satisfies λ2(X, Z,P K, α)≥E 1 2 vT πk v+∥H Zvπk ∥2 2 −2α∥v∥ 2 2 . Furthermore, supposeπ i(i= 1,2, ..., m)is...

work page

[18] [18]

Therefore, we obtain: E 1 2 vT πk v+∥H Zπk v∥2 2 ≤λ 2(X, Z,P K, α)·(1−α) +α·(2∥v∥ 2 2 +λ 2(X, Z,P K, α)) ≤λ 2(X, Z,P K, α) + 2α∥v∥2 2 . Whenπ i is sampled uniformly randomly fromP K, we have: P 1 2 vT πiv+∥H Zπi v∥2 2 <E πk 1 2 vT πk v+∥H Zvπk ∥2 2 −α∥v∥ 2 2 ≤1− 1 2 α , which is becauseλ 2(X, Z,P K, 1 2 α)≥E πk 1 2 vT πk v+∥H Zvπk ∥2 2 −α∥v∥ 2 2. Now we l...

work page

[19] [19]

Lemma 17.Suppose that condition (1),(2),(3) are satisfied

ThenY i+1 =Y i + 2XT i vi+1 We first give a lemma that provides an upper bound onX i. Lemma 17.Suppose that condition (1),(2),(3) are satisfied. Then, for anyw∈ S n−1, we have: P " sup i |X T i w| ≤max r 4Sw ln 2 δ ,2aln 2 δ !# ≥1−δ Proof of Lemma 17Letx i =v T i w, with|x i| ≤aandE(x i) = 0. ConsiderE(e λxi), we show that for all |λ| ≤ 1 a, we have:E(e λ...

work page

[20] [20]

mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e (− 1 2 c+a2λ)λS By settingλ= min( c 4a2 , ln1.25 n S ), we obtain P

+λ 2E[∥vi+1∥4 2] ≤x i(1 +λt i+1 +λ 2a2ti+1) ≤x ieλti+1+λ2a2ti+1 . Sincex 1, we obtainE[x m]≤e λ Pm i=1 ti(1+a2λ), therefore P " mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e (− 1 2 c+a2λ)λS By settingλ= min( c 4a2 , ln1.25 n S ), we obtain P " mX i=1 ∥vi∥2 2 ≥ mX i=1 ti + 1 2 cS # ≤e −min( c2S 4a2 , 1 4 ln1.25 n) Combining with the upper bound ofY i we finally...

work page

[21] [21]

∥v∗∥2 2 + kX i=1 E[∥ui∥2 2] # −E[∥H Zvπ∥2 2] ≤o(1)

Therefore, the conditions in Theorem 12 hold. Now it remains to upper bound2< v ∗ +Pk i=1 E(ui),Pk i=1[ui −E(u i)]>. Letw ∗ =v ∗ +Pk i=1 E(ui), and letr i = (w ∗)T (ui −E(u i)), we have:E(r i) = 0,|r i| ≤ qP j∈Si(vj − 1 |Si| P l∈Si vl)2∥w∗∥2 ≤ qP j∈Si(vj −¯v)2∥w∗∥2,Pk i=1 E(r2 i )≤ ∥w ∗∥2 2 ·sup w∈S n−1 Sw ≤ 1 ln4 n ∥w∗∥2 2 Pn i=1(vi −¯v)2. Now, we show t...

work page

[22] [22]

We next show that for anyS i ⊆I ′ s(s=1,2,3),|S i| ≥Ω(n 0.55)

Furthermore, letj 1 ≤ibe the largest integer such that∥ Pj1 j=1(ai, bi)∥2 2 ≥Pk i=1(a2 i +b2 i ) and∥ Pj1−1 j=1 (ai, bi)∥2 2 <Pk i=1(a2 i +b 2 i ), we obtain: ∥ iX j=1 (ai, bi)∥2 ≤ ∥ iX j=j1 (ai, bi)∥2 ≤ ∥ iX j=j1−1 (ai, bi)∥2 +∥(a j1, bj1)∥2 ≤2 vuut kX i=1 (a2 i +b 2 i ). We next show that for anyS i ⊆I ′ s(s=1,2,3),|S i| ≥Ω(n 0.55). By Lemma 19, we can ...

work page

[23] [23]

This implies that λ2(X, Z,P K, 1 4 α)−λ 2(X, Z,P n, 1 2 α) ≤ |J2|(¯b2 − ¯b)(¯c2 −¯c) +|J3|(¯b3 − ¯b)(¯c3 −¯c) +O(α)∥v∥2 2 . Extension to estimation by finite samplesActually, bothP K and the set of all permutations contain too many permutations, so thatλ 2(X, Z,P n, 1 4 α)andλ 2(X, Z,P K, 1 4 α)cannot be directly computed. There- fore, we consider estimat...

work page 2012