Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Chiheb Yaakoubi; Cosme Louart; Malik Tiomoko; Zhenyu Liao

arxiv: 2604.03146 · v2 · pith:5FJPKEKXnew · submitted 2026-04-03 · 📊 stat.ML · cs.LG

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Chiheb Yaakoubi , Cosme Louart , Malik Tiomoko , Zhenyu Liao This is my paper

Pith reviewed 2026-05-13 18:25 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords empirical risk minimizationhigh-dimensional estimationGaussian universalityconvex optimizationnon-Gaussian data designsmin-max theoremsasymptotic analysis

0 comments

The pith

In high-dimensional ERM with non-Gaussian data, the estimator's projection on a test point follows the convolution of a generally non-Gaussian distribution with an independent Gaussian whose variance is set by the trace of the estimator's 2

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to characterize how Gaussian universality breaks down in high-dimensional convex empirical risk minimization when the data design is non-Gaussian. By extending the Convex Gaussian Min-Max Theorem heuristically, it provides an asymptotic description of the estimator's mean and covariance, and shows that projections onto independent test covariates are the convolution of the non-Gaussian mean projection with a Gaussian noise term. A reader would care because this gives a precise limit on when Gaussian approximations can be used in machine learning estimators and how they fail for realistic data distributions.

Core claim

Under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, for a test covariate x independent of the training data, the projection θ̂⊤x approximately follows the convolution of the (generally non-Gaussian) distribution of μ_θ̂⊤x with an independent centered Gaussian variable of variance Tr(C_θ̂ E[xx⊤]). This is obtained by heuristically extending the CGMT to non-Gaussian settings.

What carries the argument

The heuristic extension of the Convex Gaussian Min-Max Theorem to non-Gaussian data designs, which produces an asymptotic min-max characterization of the ERM estimator statistics including its mean and covariance.

If this is right

Approximations for the mean μ_θ̂ and covariance C_θ̂ of the ERM estimator become available even for non-Gaussian designs.
Any C² regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at μ_θ̂.
The result specifies the exact form in which Gaussian universality holds or breaks for projections in ERM.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This characterization could guide the design of better uncertainty estimates for models trained on non-Gaussian data such as images or sensor readings.
Future work might extend the same heuristic to other performance measures like generalization error or to non-convex losses.
Finite-sample corrections or concentration rates could be derived to make the asymptotic result more practical for moderate dimensions.

Load-bearing premise

The heuristic extension of the Convex Gaussian Min-Max Theorem applies to non-Gaussian data under the stated concentration assumption on the data matrix.

What would settle it

Empirical histograms of θ̂⊤x from simulations with non-Gaussian data that deviate significantly from the predicted convolution distribution would falsify the approximation.

Figures

Figures reproduced from arXiv: 2604.03146 by Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao.

**Figure 1.** Figure 1: Non-Gaussian Decision Scores and Classification Error. Left: Empirical histograms of decision scores for Class 0 (light blue) and Class 1 (light red) exhibit non-Gaussian distributions that align closely with theoretical predictions (dashed blue). Gaussian approximations (green dashed) fail to capture the skewness and bimodality per class. Right: Classification error as a function of the regularization pa… view at source ↗

**Figure 2.** Figure 2: We examine different score distributions for various regularization functions ρ : θ 7→ a ⊤θ + ∥θ∥ 2 , where a = (− cos(ϕ), sin(ϕ), 0, . . . , 0) for some angle ϕ = 0, π 2 , π (from bottom left to bottom right). We use the squared loss Ly(z) = (z − y) 2 , with y = θ ∗⊤x + ε, where θ ∗ = e1. For all i ∈ [p] \ 2, x ⊤ei ∼ N (0, 1), while x ⊤e2 follows a bimodal distribution. According to Corollary 7.2, the sc… view at source ↗

**Figure 3.** Figure 3: Denoting Fa := F + span(a) and F ⊥ a := (F + span(a))⊥, and write PE for the orthogonal projection onto a subspace E. We define Jµ(µ) := E [PITH_FULL_IMAGE:figures/full_fig_p024_3.png] view at source ↗

**Figure 4.** Figure 4: Universality breakdown on MNIST data. Left: Empirical histograms of decision scores for Class 0 (light blue) and Class 1 (light red) of ˆθ ⊤x, compared with a Gaussian approximation of matching mean and variance (green dashed) and with the corrected theoretical density(dashed blue). Right: Generalization performance. Predictions based on Gaussian score universality (green dashed) fail to match empirical re… view at source ↗

read the original abstract

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $\mu_{\hat{\theta}}$ and covariance $C_{\hat{\theta}}$ of the ERM estimator $\hat{\theta}$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hat{\theta}^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $\mu_{\hat{\theta}}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $\mu_{\hat{\theta}}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, a heuristic extension of the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian designs yields an asymptotic min-max characterization of high-dimensional convex ERM. This enables approximation of the mean μ_θ̂ and covariance C_θ̂ of the estimator θ̂. Specifically, for a test covariate x independent of training data, θ̂^T x approximately follows the convolution of the (generally non-Gaussian) distribution of μ_θ̂^T x with an independent centered Gaussian of variance Tr(C_θ̂ E[xx^T]). The paper additionally proves that any C² regularizer is asymptotically equivalent to a quadratic form determined by its Hessian at zero and gradient at μ_θ̂, with numerical simulations provided for validation.

Significance. If the heuristic CGMT extension can be justified with controllable error, the work would clarify the scope and limits of Gaussian universality for ERMs by supplying explicit distributional approximations in non-Gaussian settings. The regularizer equivalence result is a clean simplification that could streamline future analyses. The simulations offer qualitative support, though the absence of quantitative error metrics limits the strength of the empirical backing.

major comments (3)

[Abstract] Abstract and main derivation: the asymptotic min-max characterization and the convolution form for θ̂^T x rest on a heuristic extension of the CGMT under the stated concentration assumption; no proof, concentration inequalities, or remainder terms are supplied to control the non-Gaussian fluctuation terms that the original CGMT exploits, making this step load-bearing for the central claim.
[Abstract] Abstract: the claim that the projection follows the stated convolution is presented as approximate, yet the manuscript invokes only standard regularity conditions on the loss and regularizer without deriving explicit error bounds or rates for the non-Gaussian case; this leaves the approximation's validity range unquantified.
[Numerical simulations] Numerical simulations section: the validation of the theoretical predictions is cited, but no error bars, explicit approximation-error metrics, or details on the number of trials are reported, weakening the empirical support for the key non-Gaussian characterization.

minor comments (2)

[Notation] Notation: ensure uniform definition of the concentration assumption on the data matrix and consistent use of symbols for μ_θ̂ and C_θ̂ across the derivation and statements.
[References] References: include additional citations to recent results on non-Gaussian high-dimensional statistics to better situate the heuristic extension relative to existing literature.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments. We clarify below that the core results rely on a heuristic extension of the CGMT, and we address each major point directly while agreeing to strengthen the empirical section.

read point-by-point responses

Referee: [Abstract] Abstract and main derivation: the asymptotic min-max characterization and the convolution form for θ̂^T x rest on a heuristic extension of the CGMT under the stated concentration assumption; no proof, concentration inequalities, or remainder terms are supplied to control the non-Gaussian fluctuation terms that the original CGMT exploits, making this step load-bearing for the central claim.

Authors: We agree that the extension is heuristic and that the manuscript supplies no proof, concentration inequalities, or remainder terms controlling the non-Gaussian fluctuations. The concentration assumption on the data matrix is invoked to justify replacing the design with an effective Gaussian one inside the min-max problem, but we do not derive explicit error control. This is an acknowledged limitation of the present analysis; the heuristic is used to obtain the min-max characterization and the convolution form. We will revise the abstract and introduction to state the heuristic character more explicitly and to discuss the role of the concentration assumption. revision: partial
Referee: [Abstract] Abstract: the claim that the projection follows the stated convolution is presented as approximate, yet the manuscript invokes only standard regularity conditions on the loss and regularizer without deriving explicit error bounds or rates for the non-Gaussian case; this leaves the approximation's validity range unquantified.

Authors: The convolution is presented as an asymptotic approximation without explicit error bounds or rates. Deriving quantitative rates for the non-Gaussian case would require a substantially more technical analysis of the CGMT extension, which lies outside the scope of this work. The standard regularity conditions on the loss and regularizer are used only to guarantee existence and uniqueness of the min-max problem. We will revise the abstract to qualify the approximation more clearly and to note the absence of explicit rates. revision: partial
Referee: [Numerical simulations] Numerical simulations section: the validation of the theoretical predictions is cited, but no error bars, explicit approximation-error metrics, or details on the number of trials are reported, weakening the empirical support for the key non-Gaussian characterization.

Authors: We accept this criticism. In the revised manuscript we will specify the number of Monte Carlo trials (typically 100), add error bars to all plots, and report quantitative approximation-error metrics such as the Kolmogorov-Smirnov statistic and mean absolute deviation between the empirical distribution of θ̂^T x and the predicted convolution. revision: yes

standing simulated objections not resolved

The absence of a rigorous proof or explicit error bounds for the heuristic CGMT extension under non-Gaussian designs; supplying such a proof would require a major technical development beyond the scope of the present manuscript.

Circularity Check

0 steps flagged

No circularity: derivation rests on external heuristic assumption rather than self-reduction

full rationale

The paper derives its asymptotic min-max characterization and the convolution form for θ̂⊤x explicitly from a stated heuristic extension of the CGMT together with a concentration assumption on the data matrix and standard regularity conditions. No equation in the provided text reduces the target result to a fitted parameter, a self-citation chain, or a quantity defined in terms of itself. The central claim is not obtained by renaming a known empirical pattern or by smuggling an ansatz through prior self-work; it is presented as following from the heuristic step. Because the load-bearing step is an external modeling assumption rather than an internal tautology, the derivation chain does not exhibit any of the enumerated circularity patterns and receives the default non-circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on a heuristic extension of CGMT together with an unproven concentration assumption on the data matrix and standard regularity conditions on loss and regularizer; no free parameters or new entities are introduced.

axioms (2)

domain assumption concentration assumption on the data matrix
Invoked to heuristically extend the Convex Gaussian Min-Max Theorem to non-Gaussian settings
domain assumption standard regularity conditions on the loss and regularizer
Required to obtain the asymptotic min-max characterization and the quadratic equivalence for C^2 regularizers

pith-pipeline@v0.9.0 · 5533 in / 1434 out tokens · 36402 ms · 2026-05-13T18:25:32.317025+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization... J(μ, α, κ;β, ν) = β²κ/2 + E[eLy(μ⊤x+αz;κ)] + ρ(μ) − να²/2 − β²/2n tr(Cx(νCx+Hρ)⁻¹)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

any C² regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at μ̂θ

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors
stat.ML 2026-05 unverdicted novelty 7.0

α-TCAV replaces TCAV's hard indicator with a tunable smooth function to create a unified probabilistic framework with lower variance and guidance for parameter choice or Bayes-optimal scoring.