Can we spot a fake?

Grigoris Paouris, Roman Vershynin, Shahar Mendelson

Authors on Pith no claims yet

classification 🧮 math.ST math.PRstat.TH

keywords dataadversaryfakeradiusarbitraryboundfixedgaussian

read the original abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point $X$ from the standard normal distribution in $\mathbb{R}^n$. An adversary observes $X$ and corrupts it by adding a vector $rt$, where they can choose any vector $t$ from a fixed set $T$ of the adversary's ``tricks'', and where $r>0$ is a fixed radius. The adversary's choice of $t=t(X)$ may depend on the true data $X$. The adversary wants to hide the corruption by making the fake data $X+rt$ statistically indistinguishable from the real data $X$. What is the largest radius $r=r(T)$ for which the adversary can create an undetectable fake? We show that for highly symmetric sets $T$, the detectability radius $r(T)$ is approximately twice the scaled Gaussian width of $T$. The upper bound actually holds for arbitrary sets $T$ and generalizes to arbitrary, non-Gaussian distributions of real data $X$. The lower bound may fail for not highly symmetric $T$, but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of $T$, which focuses on the most important directions of $T$.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Talagrand's Convexity Conjecture
math.PR 2026-05 unverdicted novelty 8.0

Any centered 1-subgaussian random vector equals the sum of a universal number of standard Gaussians, solving Talagrand's convexity conjecture.