Recognition: unknown
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
Pith reviewed 2026-05-08 14:02 UTC · model grok-4.3
The pith
Two randomized Hadamard transforms bring every coordinate's marginal distribution within O(d^{-1/2}) of a standard Gaussian.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
After any input vector is rotated by the product of two independent randomized Hadamard matrices and then normalized, the one-dimensional marginal law of each fixed coordinate lies within O(d^{-1/2}) of N(0,1) in Kolmogorov distance and in 1-Wasserstein distance. Substituting these explicit bounds into the error expressions for DRIVE and QUIC-FL yields that the expected quantization distortion is asymptotically identical to the distortion obtained when the same schemes are preceded by a uniform random rotation. A parallel argument for three transforms shows that the covariance between any two fixed coordinates decays fast enough that the expected error of any fixed-size VQ codebook remains 1
What carries the argument
The composition of two (or three) independent randomized Hadamard transforms, together with explicit Kolmogorov and Wasserstein bounds on the resulting marginal coordinate distributions.
If this is right
- DRIVE and QUIC-FL achieve the same asymptotic compression performance as when preceded by uniform random rotations.
- Any fixed bounded codebook for vector quantization yields the same expected error with three RHTs as with a uniform random rotation, up to an additive term vanishing with dimension.
- The linear-time moment check lets a system use fewer than three transforms on typical inputs while retaining the worst-case guarantees.
- Practical implementations of gradient compression, model quantization, and KV-cache compression can replace slow uniform rotations with fast Hadamard-based preprocessing without asymptotic loss.
Where Pith is reading between the lines
- The same marginal-convergence argument may apply to other high-dimensional tasks that rely on near-Gaussian marginals after orthogonal preprocessing, such as approximate nearest-neighbor search.
- An adaptive choice of one versus two transforms based on the moment check could reduce average runtime on non-adversarial data without sacrificing the theoretical safety net.
- Extending the covariance decay analysis to higher-order moments or to joint distributions of small blocks could cover additional quantization schemes that currently lack guarantees.
Load-bearing premise
The performance guarantees of DRIVE, QUIC-FL, and the VQ analyses depend only on the marginal laws or pairwise covariances of the rotated coordinates and admit direct substitution of the new distance bounds without extra error terms.
What would settle it
Measure the empirical Kolmogorov distance of a fixed coordinate after two RHTs to a standard Gaussian for a sequence of growing dimensions d and verify whether the observed distance remains bounded by C/sqrt(d) for some constant C independent of d.
read the original abstract
Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate is individually distributed as a shifted beta distribution, which converges to a Gaussian distribution in high dimensions. Generally, one RHT is not suitable in the worst case, as individual coordinates can be far from these distributions. We show that after composing two RHTs on any $d$-sized input vector, the marginal distribution of every fixed coordinate of the normalized rotated vector is within $O(d^{-1/2})$ of a standard Gaussian both in Kolmogorov distance and in $1$-Wasserstein distance. We then plug these bounds into the analyses of modern compression schemes, namely DRIVE and QUIC-FL, and show that two RHTs achieve performance that asymptotically matches URRs. However, we show that two RHTs may not be sufficient for Vector Quantization (VQ), which often requires weak correlation across fixed-size blocks of coordinates (as opposed to only marginal distribution convergence for single coordinates). We prove that a composition of three RHTs leads to decaying coordinate covariance. This ensures that any fixed, bounded, multi-dimensional VQ codebook optimized for URRs has the same expected error when using three RHTs, up to an additive term that vanishes with the dimension. Finally, because practical inputs are rarely adversarial, we propose a linear-time ${O}(d)$ check on the input's moments to dynamically adapt the number of RHTs used at runtime to improve performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that composing two randomized Hadamard transforms (RHTs) on any d-dimensional input vector ensures that every fixed coordinate of the normalized output is within O(d^{-1/2}) of a standard Gaussian in both Kolmogorov and 1-Wasserstein distance. These marginal bounds are substituted into the existing analyses of DRIVE and QUIC-FL to conclude that two RHTs achieve asymptotic performance matching uniform random rotations (URRs). For vector quantization, three RHTs are shown to produce decaying coordinate covariances, ensuring that any fixed bounded VQ codebook optimized for URRs has matching expected error up to a vanishing additive term. A linear-time O(d) moment-based check is proposed to dynamically select the number of RHTs at runtime.
Significance. If the substitution of the marginal bounds is rigorous and the error terms vanish, the work supplies the missing theoretical justification for the widespread practical replacement of expensive uniform random rotations by fast RHTs in gradient compression, KV-cache quantization, and approximate nearest-neighbor search. The explicit O(d^{-1/2}) rates, the separate treatment of marginal versus joint (covariance) requirements, and the dynamic adaptation heuristic are concrete strengths. The derivation of the distributional bounds directly from the algebraic properties of Hadamard matrices and random sign vectors, without fitted parameters, is a positive contribution.
major comments (2)
- [analysis of DRIVE/QUIC-FL substitution] The central claim that two RHTs asymptotically match URR performance in DRIVE and QUIC-FL (abstract and the substitution paragraph following the marginal bounds) rests on direct insertion of the O(d^{-1/2}) Kolmogorov/Wasserstein distances. No explicit propagation of the error through the quantization-error or convergence-rate expressions is supplied; it is therefore unclear whether d-dependent factors (e.g., summation over d coordinates, Lipschitz constants of the test functions, or higher-moment terms) produce a vanishing total difference or a non-vanishing residual as d→∞.
- [VQ covariance argument] For the VQ result, the statement that three RHTs suffice because they produce “decaying coordinate covariance” (abstract) does not specify the decay rate or the precise dependence on block size. Without this, it is impossible to verify that the additive error term for any fixed bounded codebook indeed vanishes uniformly.
minor comments (2)
- [statement of main theorem] The normalization step applied to the rotated vector is referenced but never written explicitly; the precise scaling factor should appear in the statement of the main distributional theorem.
- [abstract and introduction] Several O(·) statements in the abstract and introduction omit the dependence of the hidden constants on other parameters (e.g., the number of RHT compositions or the input norm).
Simulated Author's Rebuttal
We thank the referee for the thorough review and constructive feedback on our manuscript. The two major comments identify areas where additional rigor and explicit details will strengthen the presentation. We address each point below and will revise the manuscript to incorporate the necessary clarifications and expansions.
read point-by-point responses
-
Referee: The central claim that two RHTs asymptotically match URR performance in DRIVE and QUIC-FL (abstract and the substitution paragraph following the marginal bounds) rests on direct insertion of the O(d^{-1/2}) Kolmogorov/Wasserstein distances. No explicit propagation of the error through the quantization-error or convergence-rate expressions is supplied; it is therefore unclear whether d-dependent factors (e.g., summation over d coordinates, Lipschitz constants of the test functions, or higher-moment terms) produce a vanishing total difference or a non-vanishing residual as d→∞.
Authors: We agree that the current substitution paragraph performs a direct insertion of the marginal bounds without a fully expanded propagation analysis. The manuscript relies on the fact that the existing DRIVE and QUIC-FL error expressions involve expectations of coordinate-wise functions whose sensitivity is controlled (typically Lipschitz with constants that do not introduce non-vanishing d-factors after normalization). However, to make the vanishing of the total difference fully rigorous, we will add an explicit error-propagation lemma in the revision. This lemma will bound the difference in quantization error or convergence rates by a multiple of the marginal distance, confirming that all d-dependent terms (sums over coordinates, Lipschitz factors, moment terms) yield an overall o(1) residual as d→∞. The addition will follow the marginal-bounds section. revision: yes
-
Referee: For the VQ result, the statement that three RHTs suffice because they produce “decaying coordinate covariance” (abstract) does not specify the decay rate or the precise dependence on block size. Without this, it is impossible to verify that the additive error term for any fixed bounded codebook indeed vanishes uniformly.
Authors: The referee is correct that the abstract and high-level statement do not quantify the covariance decay rate or its dependence on block size. The appendix proof establishes that three RHTs produce coordinate covariances that decay with dimension, which is sufficient for the additive error to vanish for any fixed bounded codebook. In the revision we will extract the explicit decay rate from the proof, state its dependence on block size, and verify that the resulting additive term vanishes uniformly as d→∞ whenever the block size remains fixed (or grows slower than d). These details will be added to the abstract, the VQ theorem statement, and the surrounding discussion. revision: yes
Circularity Check
No circularity: new algebraic concentration bounds on RHT marginals are derived independently and substituted into prior analyses without reduction to fits or self-definitions.
full rationale
The paper establishes O(d^{-1/2}) Kolmogorov and Wasserstein bounds on each marginal coordinate after two RHTs via direct analysis of Hadamard matrix properties and random sign vectors. These bounds are then inserted into the existing DRIVE and QUIC-FL error analyses to obtain asymptotic equivalence with URRs. The VQ extension similarly introduces a fresh covariance decay result for three RHTs. No quoted step reduces a claimed prediction to a fitted parameter, self-citation chain, or definitional tautology; the core distributional claims rest on external probabilistic facts rather than renaming or smuggling prior results. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Randomized Hadamard transforms are orthogonal and can be computed in linear time via fast Walsh-Hadamard transform with random sign flips.
- domain assumption The performance of DRIVE, QUIC-FL, and VQ depends on coordinate marginals or pairwise covariances in a manner that permits direct substitution of distribution-distance bounds.
Reference graph
Works this paper leans on
-
[1]
US Government printing office, 1948
Milton Abramowitz and Irene A Stegun.Handbook of mathematical functions with formulas, graphs, and mathematical tables, volume 55. US Government printing office, 1948
1948
-
[2]
Approximate nearest neighbors and the fast johnson- lindenstrauss transform
Nir Ailon and Bernard Chazelle. Approximate nearest neighbors and the fast johnson- lindenstrauss transform. InProceedings of the thirty-eighth annual ACM symposium on Theory of computing, pages 557–563, 2006
2006
-
[3]
Wasserstein generative adversarial networks
Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. InInternational conference on machine learning, pages 214–223. Pmlr, 2017. 11
2017
-
[4]
Quarot: Outlier-free 4-bit inference in rotated llms.Advances in Neural Information Processing Systems, 37:100213– 100240, 2024
Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman. Quarot: Outlier-free 4-bit inference in rotated llms.Advances in Neural Information Processing Systems, 37:100213– 100240, 2024
2024
-
[5]
Optimal and approximate adaptive stochastic quantization.Advances in Neural Information Processing Systems, 37:94265–94291, 2024
Ran B Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Optimal and approximate adaptive stochastic quantization.Advances in Neural Information Processing Systems, 37:94265–94291, 2024
2024
-
[6]
A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work
Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, Amit Portnoy, and Shay Vargaftik. A note on turboquant and the earlier drive/eden line of work.arXiv preprint arXiv:2604.18555, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[7]
Better than optimal: Improving adaptive stochastic quantization using shared randomness.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 9(3):1–44, 2025
Ran Ben Basat, Yaniv Ben-Itzhak, Michael Mitzenmacher, and Shay Vargaftik. Better than optimal: Improving adaptive stochastic quantization using shared randomness.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 9(3):1–44, 2025
2025
-
[8]
How to send a real number using a single bit (and some shared randomness)
Ran Ben-Basat, Michael Mitzenmacher, and Shay Vargaftik. How to send a real number using a single bit (and some shared randomness). InICALP 2021, pages 1–20. Schloss Dagstuhl- Leibniz-Zentrum für Informatik, 2021
2021
-
[9]
Ac- celerating federated learning with quick distributed mean estimation
Ran Ben-Basat, Amit Portnoy, Gil Einziger, Yaniv Ben-Itzhak, Michael Mitzenmacher, et al. Ac- celerating federated learning with quick distributed mean estimation. InForty-first International Conference on Machine Learning, 2024
2024
-
[10]
Scionfl: Efficient and robust secure quantized aggregation
Yaniv Ben-Itzhak, Helen Möllering, Benny Pinkas, Thomas Schneider, Ajith Suresh, Oleksandr Tkachenko, Shay Vargaftik, Christian Weinert, Hossein Yalame, and Avishay Yanai. Scionfl: Efficient and robust secure quantized aggregation. In2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 490–511. IEEE, 2024
2024
-
[11]
A tight Gaussian bound for weighted sums of Rademacher random variables.Bernoulli, 21(2):1231 – 1237, 2015
Vidmantas Kastytis Bentkus and Dainius Dzindzalieta. A tight Gaussian bound for weighted sums of Rademacher random variables.Bernoulli, 21(2):1231 – 1237, 2015
2015
- [12]
-
[13]
Stein’s method of normal approximation: Some recollections and reflections
Louis HY Chen. Stein’s method of normal approximation: Some recollections and reflections. The Annals of Statistics, 49(4):1850–1863, 2021
2021
-
[14]
Springer Science & Business Media, 2010
Louis HY Chen, Larry Goldstein, and Qi-Man Shao.Normal approximation by Stein’s method. Springer Science & Business Media, 2010
2010
-
[15]
When ml training cuts through congestion: Just-in-time gradient compression via packet trimming
Xiaoqi Chen, Shay Vargaftik, and Ran Ben Basat. When ml training cuts through congestion: Just-in-time gradient compression via packet trimming. InProceedings of the 23rd ACM Workshop on Hot Topics in Networks, pages 177–185, 2024
2024
-
[16]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems, 35:30318–30332, 2022
2022
-
[17]
Spqr: A sparse-quantized representation for near-lossless llm weight compression,
Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression.arXiv preprint arXiv:2306.03078, 2023
-
[18]
Docofl: Downlink compression for cross-device federated learning
Ron Dorfman, Shay Vargaftik, Yaniv Ben-Itzhak, and Kfir Yehuda Levy. Docofl: Downlink compression for cross-device federated learning. InInternational Conference on Machine Learning, pages 8356–8388. PMLR, 2023
2023
-
[19]
Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi- Wing Wong. Practical and asymptotically optimal quantization of high-dimensional vectors in euclidean space for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 3(3):1–26, 2025. 12
2025
-
[20]
Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024
Jianyang Gao and Cheng Long. Rabitq: Quantizing high-dimensional vectors with a theoretical error bound for approximate nearest neighbor search.Proceedings of the ACM on Management of Data, 2(3):1–27, 2024
2024
-
[21]
Wenchen Han, Shay Vargaftik, Michael Mitzenmacher, and Ran Ben Basat. Dynamiq: Ac- celerating gradient synchronization using compressed multi-hop all-reduce.arXiv preprint arXiv:2602.08923, 2026
-
[22]
Beyond throughput and compression ratios: Towards high end-to-end utility of gradient compression
Wenchen Han, Shay Vargaftik, Michael Mitzenmacher, Brad Karp, and Ran Ben Basat. Beyond throughput and compression ratios: Towards high end-to-end utility of gradient compression. InProceedings of the 23rd ACM Workshop on Hot Topics in Networks, pages 186–194, 2024
2024
-
[23]
Princeton university press, 2012
Kathy Horadam.Hadamard matrices and their applications. Princeton university press, 2012
2012
-
[24]
Federated Learning: Strategies for Improving Communication Efficiency
Jakub Koneˇcn`y, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency.arXiv preprint arXiv:1610.05492, 2016
work page internal anchor Pith review arXiv 2016
-
[25]
Owq: Outlier- aware weight quantization for efficient fine-tuning and inference of large language models
Changhun Lee, Jungyu Jin, Taesu Kim, Hyungjun Kim, and Eunhyeok Park. Owq: Outlier- aware weight quantization for efficient fine-tuning and inference of large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 13355–13364, 2024
2024
-
[26]
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
Minghao Li, Ran Ben Basat, Shay Vargaftik, ChonLam Lao, Kevin Xu, Xinran Tang, Michael Mitzenmacher, and Minlan Yu. THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression. InUSENIX Symposium on Networked Systems Design and Implementation, 2024
2024
-
[27]
Higgs: Pushing the limits of large language model quantization via the linearity theorem
Vladimir Malinovskii, Andrei Panferov, Ivan Ilin, Han Guo, Peter Richtárik, and Dan Alistarh. Higgs: Pushing the limits of large language model quantization via the linearity theorem. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Pa...
2025
-
[28]
Optimal quantization for matrix multiplication.IEEE Transactions on Information Theory, 2025
Or Ordentlich and Yury Polyanskiy. Optimal quantization for matrix multiplication.IEEE Transactions on Information Theory, 2025
2025
-
[29]
Andrei Panferov, Erik Schultheis, Soroush Tabesh, and Dan Alistarh. Quartet ii: Accu- rate llm pre-training in nvfp4 by improved unbiased gradient estimation.arXiv preprint arXiv:2601.22813, 2026
-
[30]
A multivariate central limit theorem for Lipschitz and smooth test functions
Martin Raiˇc. A multivariate central limit theorem for lipschitz and smooth test functions.arXiv preprint arXiv:1812.08268, 2018
work page Pith review arXiv 2018
-
[31]
Distributed mean estimation with limited communication
Ananda Theertha, X Felix, H Brendan McMahan Yu, Sanjiv Kumar, et al. Distributed mean estimation with limited communication. InInternational Conference on Machine Learning, 2017
2017
-
[32]
Quip#: Even better llm quantization with hadamard incoherence and lattice codebooks.Proceedings of machine learning research, 235:48630, 2024
Albert Tseng, Jerry Chee, Qingyao Sun, V olodymyr Kuleshov, and Christopher De Sa. Quip#: Even better llm quantization with hadamard incoherence and lattice codebooks.Proceedings of machine learning research, 235:48630, 2024
2024
-
[33]
Qtip: Quantization with trellises and incoherence processing.Advances in Neural Information Processing Systems, 37:59597– 59620, 2024
Albert Tseng, Qingyao Sun, David Hou, and Christopher De. Qtip: Quantization with trellises and incoherence processing.Advances in Neural Information Processing Systems, 37:59597– 59620, 2024
2024
-
[34]
New estimates of the convergence rate in the lyapunov theorem.arXiv preprint arXiv:0912.0726, 2009
Ilya Tyurin. New estimates of the convergence rate in the lyapunov theorem.arXiv preprint arXiv:0912.0726, 2009
-
[35]
Eden: Communication-efficient and robust distributed mean estimation for federated learning
Shay Vargaftik, Ran Ben Basat, Amit Portnoy, Gal Mendelson, Yaniv Ben Itzhak, and Michael Mitzenmacher. Eden: Communication-efficient and robust distributed mean estimation for federated learning. InInternational Conference on Machine Learning, pages 21984–22014. PMLR, 2022. 13
2022
-
[36]
Drive: One-bit distributed mean estimation.Advances in Neural Information Processing Systems, 34:362–377, 2021
Shay Vargaftik, Ran Ben-Basat, Amit Portnoy, Gal Mendelson, Yaniv Ben-Itzhak, and Michael Mitzenmacher. Drive: One-bit distributed mean estimation.Advances in Neural Information Processing Systems, 34:362–377, 2021
2021
-
[37]
Springer, 2009
Cédric Villani et al.Optimal transport: old and new, volume 338. Springer, 2009
2009
-
[38]
{OptiReduce}: Resilient and {Tail- Optimal}{AllReduce} for distributed deep learning in the cloud
Ertza Warraich, Omer Shabtai, Khalid Manaa, Shay Vargaftik, Yonatan Piasetzky, Matty Kadosh, Lalith Suresh, and Muhammad Shahbaz. {OptiReduce}: Resilient and {Tail- Optimal}{AllReduce} for distributed deep learning in the cloud. In22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), pages 685–703, 2025
2025
-
[39]
Yongyi Yang, Jianyang Gao, and Wei Hu. Raana: A fast, flexible, and data-efficient post-training quantization algorithm.arXiv preprint arXiv:2504.03717, 2025
-
[40]
Turboquant: Online vector quantization with near-optimal distortion rate,
Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate.arXiv preprint arXiv:2504.19874, 2025
-
[41]
Hack: Homomorphic acceleration via compression of the key-value cache for disaggregated llm inference
Zeyu Zhang, Haiying Shen, Shay Vargaftik, Ran Ben Basat, Michael Mitzenmacher, and Minlan Yu. Hack: Homomorphic acceleration via compression of the key-value cache for disaggregated llm inference. InProceedings of the ACM SIGCOMM 2025 Conference, pages 1245–1247, 2025. A Deferred analysis for Section 4.2 In this appendix, we first establish the asymptotic...
2025
-
[42]
Step 5: Computing the variance.The relative variance we evaluate is Var(ˆx) ∥x∥2 2 = E[∥ˆx∥2 2]−∥E[ˆx]∥2 2 ∥x∥2 2
Dividing by∥x∥ 2 2 proves Equation (i): ∥B(x)∥2 2 ∥x∥2 2 = 1 c2 d ∥µ−c dex∥2 2 ≤ O(d −1/2). Step 5: Computing the variance.The relative variance we evaluate is Var(ˆx) ∥x∥2 2 = E[∥ˆx∥2 2]−∥E[ˆx]∥2 2 ∥x∥2 2 . Because the 2-RHT matrix preserves lengths due to its orthogonality, the uncentered ℓ2 norm of any d-dimensional ±1 sign vector is identically √ d. T...
-
[43]
step vectors
Since ∥µ∥2 2 =⟨µ,ex⟩2+∥µ⊥∥2 2 = (c±δ) 2+O(d−1/2) =c 2±O(d−1/2), we subtract this from the relative uncentered norm: Var(ˆx) ∥x∥2 2 = 1 c2 d − 1 c2 d (c2 ± O(d−1/2)) = 1−c 2 c2 d ± O(d−1/2). As established, cd =c+O(d −1), which directly implies 1 c2 d = 1 c2 +O(d −1). We substitute this into the leading term, absorbing the O(d−1) difference into the O(d−1/...
-
[44]
We decompose this error into a quadratic term and a codebook-dependent term: 21 LC(v) =∥v∥ 2 2 +g C(v), where gC(v) = min c∈C ∥c∥2 2 −2⟨v, c⟩ . Although LC is not globally Lipschitz because of the term∥v∥ 2 2, the functiong C is globally Lipschitz since for anyv, w∈R k, |gC(v)−g C(w)| ≤sup c∈C ∥c∥2 2 −2⟨v, c⟩ − ∥c∥2 2 −2⟨w, c⟩ = 2 sup c∈C |⟨v−w, c⟩| ≤2B∥v...
-
[45]
Step 3: The small covariance-mismatch case.Assume that r≤ 1 2
We prove the desired estimate by splitting into two cases according to the size ofr. Step 3: The small covariance-mismatch case.Assume that r≤ 1 2 . Since Σ is a covariance matrix, it is symmetric positive semidefinite. Let λ1, . . . , λk be the eigenvalues of Σ. Then, the eigenvalues of Σ−I k are λ1 −1, . . . , λk −1. Therefore, r2 =∥Σ−I k∥2 F =Pk ℓ=1(λℓ...
-
[46]
Since the eigenvalues of Σ lie in [1/2,3/2] , the eigenvalues of Σ−1/2 lie in hq 2 3 , √ 2 i
Summing over j yieldsPn j=1 E∥Σ−1/2Xj∥3 2 ≤ ∥Σ −1/2∥3 opB.Thus, W1(eS, Z)≤C k∥Σ−1/2∥3 opB. Since the eigenvalues of Σ lie in [1/2,3/2] , the eigenvalues of Σ−1/2 lie in hq 2 3 , √ 2 i . Be- cause Σ−1/2 is symmetric positive definite, its operator norm is its largest eigenvalue. Therefore, ∥Σ−1/2∥op ≤ √ 2.Substituting this into the preceding bound gives W1...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.