arxiv: 2604.03882 · v1 · submitted 2026-04-04 · 🧮 math.PR · math.FA

Recognition: 2 theorem links

· Lean Theorem

A homogenization principle for total variation

Aryeh Kontorovich

Authors on Pith no claims yet

Pith reviewed 2026-05-13 16:52 UTC · model grok-4.3

classification 🧮 math.PR math.FA

keywords total variationproduct measureshomogenizationconvolution inequalityprobability measurestotal variation distanceembedding

0 comments

The pith

Total variation between product distributions is bounded below by a constant times that of their averaged versions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the total variation distance between two product probability measures is at least a positive universal constant times the total variation between the products of their averages. This homogenization inequality shows that the distance does not vanish when averaging the marginals. A reader would care because it provides a way to relate complex product measures to simpler averaged ones, potentially simplifying analysis in probability and statistics. The proof relies on embedding the measures into positive measures on the real line and using a convolution functional that preserves the total variation.

Core claim

For arbitrary probability measures P1,...,Pn and Q1,...,Qn on a measurable space, the total variation between the tensor products of the Pi's and Qi's is at least c times the total variation between the n-fold product of their averages, where c is a universal positive constant. This is proved by embedding each pair into positive measures eta_i on R, defining a functional T such that TV of products equals T of the convolution of the eta's, showing that T of the convolution is at least c times T of the average eta convolved n times, and lifting back to show that equals at least the TV of the averaged products.

What carries the argument

A one-dimensional embedding of probability measure pairs into positive measures on R, together with a functional T over measures on R that realizes the total variation of product measures exactly via convolution of the embedded measures.

Load-bearing premise

The embedding of the probability measures into positive measures on the real line allows the total variation of the products to be exactly represented by the functional T applied to their convolutions.

What would settle it

A counterexample with specific measures P_i and Q_i where the ratio TV(products)/TV(averages products) goes to zero as n increases would falsify the existence of a universal c.

read the original abstract

A homogenization principle for total variation We prove an inequality comparing the variational distance between pairs of product probability measures to its homogenized counterpart. If $P_1,\ldots,P_n,Q_1,\ldots,Q_n$ are arbitrary probability measures on a measurable space and $\bar P:=\frac1n\sum_{i=1}^n P_i, \bar Q:=\frac1n\sum_{i=1}^n Q_i $, we show that $$TV\!\left(\bigotimes_{i=1}^n P_i, \bigotimes_{i=1}^n Q_i\right) \;\ge\; c\,TV(\bar P^{\otimes n},\bar Q^{\otimes n}),$$ where $c>0$ is a universal constant. The proof is based on a one-dimensional representation of total variation between products. We embed pairs of probability distributions $P_i,Q_i$ into positive measures $\eta_i$ on $\mathbb{R}$. We then define a functional $T$ over measures on $\mathbb{R}$ that realizes TV over products via convolution: $TV\!\left(\bigotimes_{i=1}^n P_i, \bigotimes_{i=1}^n Q_i\right)=T(\eta_1*\cdots *\eta_n)$. Our main analytic discovery is that for the relevant class of positive measures $\eta_i$, the convolution inequality $T(\eta_1*\cdots*\eta_n) \ge c\,T\!\left(\bar\eta^{*n}\right)$ holds, where $\bar\eta=\frac1n\sum_{i=1}^n \eta_i$. Finally, a higher-dimensional lifting argument shows that $T\!\left(\bar\eta^{*n}\right)\ge TV(\bar P^{\otimes n},\bar Q^{\otimes n})$. To our knowledge, both the exact representation and the convolution inequality are new.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proves a homogenization inequality for total variation: for arbitrary probability measures P_1,...,P_n and Q_1,...,Q_n on a measurable space, with averages bar P and bar Q, one has TV(⊗_{i=1}^n P_i, ⊗_{i=1}^n Q_i) ≥ c TV(bar P^{⊗n}, bar Q^{⊗n}) for a universal constant c>0. The argument proceeds by embedding each pair (P_i,Q_i) into positive measures η_i on R, introducing a functional T on positive measures on R such that TV of the products equals T of the convolution η_1 * ⋯ * η_n, establishing the convolution inequality T(η_1*⋯*η_n) ≥ c T(bar η^{*n}), and finally lifting T(bar η^{*n}) ≥ TV(bar P^{⊗n}, bar Q^{⊗n}). Both the representation and the convolution inequality are presented as new.

Significance. If the central inequality holds, the result supplies a dimension-free lower bound relating heterogeneous product total variation to its homogenized counterpart. This could find use in concentration, empirical-process theory, and statistical testing where one wishes to reduce to i.i.d. cases. The one-dimensional embedding and the associated convolution inequality for T constitute new analytic machinery that may be of independent interest beyond the present application.

major comments (3)

[Section 2 (representation via embedding)] The representation equality TV(⊗ P_i, ⊗ Q_i) = T(η_1 * ⋯ * η_n) is load-bearing; it must hold exactly for every collection of probability measures, including those with atoms or mutually singular components. The construction of the embedding map and the functional T (Section 2) needs an explicit verification that the equality is preserved under convolution for all such pairs, not merely for a dense subclass.
[Section 3 (convolution inequality)] The convolution inequality T(η_1 * ⋯ * η_n) ≥ c T(bar η^{*n}) is asserted for the image of the embedding map. It is unclear whether the class of admissible η_i is closed under averaging and convolution or whether the inequality requires additional regularity (e.g., absolute continuity or moment bounds) that the embedding does not automatically guarantee (Section 3, main analytic step).
[Section 4 (lifting)] The final lifting step T(bar η^{*n}) ≥ TV(bar P^{⊗n}, bar Q^{⊗n}) must recover the total-variation distance of the averaged measures without loss of the universal constant c. The argument should be checked for cases in which the averaged measures bar P and bar Q have different supports from the original collection (Section 4, lifting argument).

minor comments (2)

[Introduction] The abstract states that both the representation and the convolution inequality are new; a short comparison paragraph with existing one-dimensional representations of total variation (e.g., via cumulative distribution functions) would help readers assess novelty.
[Section 3] Notation for the averaged measure bar η is introduced after the convolution inequality is stated; moving the definition earlier would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments identify points where additional explicit verification would strengthen the manuscript. We address each major comment below and will revise accordingly.

read point-by-point responses

Referee: [Section 2 (representation via embedding)] The representation equality TV(⊗ P_i, ⊗ Q_i) = T(η_1 * ⋯ * η_n) is load-bearing; it must hold exactly for every collection of probability measures, including those with atoms or mutually singular components. The construction of the embedding map and the functional T (Section 2) needs an explicit verification that the equality is preserved under convolution for all such pairs, not merely for a dense subclass.

Authors: We agree that the representation must be verified directly for general measures. The embedding maps each pair (P_i, Q_i) to a positive measure η_i on R by integrating against a fixed separating function; T is defined so that it recovers the total-variation functional on the product space. Because total variation is a supremum over bounded measurable functions and convolution corresponds exactly to the product measure, the equality holds by direct substitution for arbitrary measures, including atoms and mutually singular parts. To make this fully transparent we will insert a short lemma in Section 2 that carries out the verification explicitly on atomic measures and on the singular-continuous decomposition, confirming that no approximation step is used. revision: yes
Referee: [Section 3 (convolution inequality)] The convolution inequality T(η_1 * ⋯ * η_n) ≥ c T(bar η^{*n}) is asserted for the image of the embedding map. It is unclear whether the class of admissible η_i is closed under averaging and convolution or whether the inequality requires additional regularity (e.g., absolute continuity or moment bounds) that the embedding does not automatically guarantee (Section 3, main analytic step).

Authors: The image class is closed under averaging and convolution: each η_i has total mass 1 and the average bar η is again the image of the averaged pair (bar P, bar Q). The proof of the convolution inequality in Section 3 relies only on positivity of the measures and the specific variational definition of T; it does not invoke absolute continuity or moment conditions. The argument proceeds by reducing the inequality to a one-dimensional convolution estimate that holds for all positive finite measures. We will revise the opening paragraph of Section 3 to state the precise class explicitly and add a short remark confirming that the analytic step applies verbatim to the embedded measures without extra regularity assumptions. revision: partial
Referee: [Section 4 (lifting)] The final lifting step T(bar η^{*n}) ≥ TV(bar P^{⊗n}, bar Q^{⊗n}) must recover the total-variation distance of the averaged measures without loss of the universal constant c. The argument should be checked for cases in which the averaged measures bar P and bar Q have different supports from the original collection (Section 4, lifting argument).

Authors: The lifting applies the identical embedding to the averaged measures bar P and bar Q, so T(bar η^{*n}) is defined exactly as the total variation of the n-fold product of the embedded averages. Because the embedding is measure-preserving for total variation and the constant c originates from the convolution step (which is independent of support), no loss occurs. Supports of bar P and bar Q are contained in the union of the original supports, but total variation is insensitive to this inclusion. We will add a brief paragraph at the end of Section 4 that records this support relation and verifies that the inequality remains valid with the same c. revision: yes

Circularity Check

0 steps flagged

No significant circularity; novel embedding and convolution inequality are independent of inputs

full rationale

The derivation constructs an embedding of arbitrary (P_i, Q_i) into positive measures η_i on R, defines functional T such that TV(⊗P_i, ⊗Q_i) = T(η_1 * ⋯ * η_n) holds by the chosen representation, proves the new convolution inequality T(η_1 * ⋯ * η_n) ≥ c T(¯η^{*n}) for the induced class, and applies a lifting step T(¯η^{*n}) ≥ TV(¯P^{⊗n}, ¯Q^{⊗n}). None of these steps reduces the target inequality to a fitted parameter, self-citation chain, or definitional tautology; the representation equality and convolution bound are established as fresh analytic results rather than by construction equating outputs to inputs. The argument is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The claim rests on standard properties of total variation and convolution together with two paper-specific constructions: the embedding into measures on R and the functional T. No free parameters are fitted; the constant c is existential.

axioms (1)

standard math Total variation distance and convolution are well-defined for probability measures and positive measures on R
Invoked throughout the embedding and convolution steps; standard in measure-theoretic probability.

invented entities (2)

Embedding map from pairs (P_i, Q_i) to positive measures η_i on R no independent evidence
purpose: To reduce multidimensional product TV to a one-dimensional convolution problem
Defined in the paper to enable the representation TV(products) = T(convolution of η's)
Functional T on positive measures on R no independent evidence
purpose: To realize the total variation of the product measures exactly via convolution
Invented to convert the original inequality into a convolution inequality on R

pith-pipeline@v0.9.0 · 5637 in / 1337 out tokens · 35616 ms · 2026-05-13T16:52:46.682273+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel; dAlembert_to_ODE_general echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

T(η)=½∫|e^x−e^{-x}|η(dx) with ∫e^{±x}η=1; TV(⊗Pi,⊗Qi)=T(η1*⋯*ηn)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean costAlphaLog_high_calibrated_iff; J_uniquely_calibrated_via_higher_derivative echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

admissible measures closed under convolution; mass-defect α and multilinear Ψ representation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Independent process approximations for random combina- torial structures.Advances in Mathematics, 104(1):90–154, 1994

Richard Arratia and Simon Tavaré. Independent process approximations for random combina- torial structures.Advances in Mathematics, 104(1):90–154, 1994. doi: 10.1006/aima.1994.1022

work page doi:10.1006/aima.1994.1022 1994
[2]

Estimator selection with respect to Hellinger-type risks.Probability Theory and Related Fields, 151(1–2):353–401, 2011

Yannick Baraud. Estimator selection with respect to Hellinger-type risks.Probability Theory and Related Fields, 151(1–2):353–401, 2011. doi: 10.1007/s00440-010-0302-y

work page doi:10.1007/s00440-010-0302-y 2011
[3]

Rho-estimators revisited: General theory and applications

Yannick Baraud and Lucien Birgé. Rho-estimators revisited: General theory and applications. The Annals of Statistics, 46(6B):3767–3804, 2018. doi: 10.1214/17-AOS1675

work page doi:10.1214/17-aos1675 2018
[4]

Bhattacharyya

A. Bhattacharyya. On a measure of divergence between two multinomial populations.Sankhy¯ a, 7:401–406, 1946

work page 1946
[5]

Robust testing for independent non identically distributed variables and Markov chains

Lucien Birgé. Robust testing for independent non identically distributed variables and Markov chains. In J. P. Florens, M. Mouchart, J. P. Raoult, L. Simar, and A. F. M. Smith, editors, Specifying Statistical Models, volume 16 ofLecture Notes in Statistics, pages 134–162. Springer, New York, NY, 1983. doi: 10.1007/978-1-4612-5503-1_9

work page doi:10.1007/978-1-4612-5503-1_9 1983
[6]

On deterministically approximating total variation distance

Weiming Feng, Liqiang Liu, and Tianren Liu. On deterministically approximating total variation distance. InProceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1766–1791, 2024. doi: 10.1137/1.9781611977912.70

work page doi:10.1137/1.9781611977912.70 2024
[7]

Halmos.Measure Theory

Paul R. Halmos.Measure Theory. Graduate Texts in Mathematics, Vol. 18. Springer, New York, NY, 1974. Reprint of the 1950 edition. doi: 10.1007/978-1-4684-9440-2

work page doi:10.1007/978-1-4684-9440-2 1974
[8]

Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen.Journal für die reine und angewandte Mathematik, 136:210–271, 1909

Ernst Hellinger. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen.Journal für die reine und angewandte Mathematik, 136:210–271, 1909

work page 1909
[9]

Peter J. Huber. A robust version of the probability ratio test.The Annals of Mathematical Statistics, 36(6):1753–1758, 1965. doi: 10.1214/aoms/1177699803

work page doi:10.1214/aoms/1177699803 1965
[10]

On equivalence of infinite product measures.Annals of Mathematics, 49(1):214–224, 1948

Shizuo Kakutani. On equivalence of infinite product measures.Annals of Mathematics, 49(1):214–224, 1948. doi: 10.2307/1969123

work page doi:10.2307/1969123 1948
[11]

On the tensorization of the variational distance.Electronic Communications in Probability, 30:1–10, 2025

Aryeh Kontorovich. On the tensorization of the variational distance.Electronic Communications in Probability, 30:1–10, 2025. doi: 10.1214/25-ECP680

work page doi:10.1214/25-ecp680 2025
[12]

TV homogenization inequalities, preprint, 2026

Aryeh Kontorovich. TV homogenization inequalities, preprint, 2026. arXiv:2601.04079

work page arXiv 2026
[13]

Springer Series in Statistics

Lucien Le Cam and Grace Lo Yang.Asymptotics in Statistics: Some Basic Concepts. Springer Series in Statistics. Springer, New York, second edition, 2000. doi: 10.1007/978-1-4612-1166-2

work page doi:10.1007/978-1-4612-1166-2 2000
[14]

Cambridge University Press, Cambridge, 2024

Yury Polyanskiy and Yihong Wu.Information Theory: From Coding to Learning. Cambridge University Press, Cambridge, 2024

work page 2024
[15]

Closeness of convolutions of probability measures.Bernoulli, 16(1):23–50, 2010

Bero Roos. Closeness of convolutions of probability measures.Bernoulli, 16(1):23–50, 2010. doi: 10.3150/08-BEJ171

work page doi:10.3150/08-bej171 2010
[16]

Refined total variation bounds in the multivariate and compound Poisson approxi- mation.ALEA, Latin American Journal of Probability and Mathematical Statistics, 14:337–360,

Bero Roos. Refined total variation bounds in the multivariate and compound Poisson approxi- mation.ALEA, Latin American Journal of Probability and Mathematical Statistics, 14:337–360,

work page
[17]

doi: 10.30757/ALEA.v14-19. 16

work page doi:10.30757/alea.v14-19
[18]

Robust hypothesis testing and distribution estimation in Hellinger distance

Ananda Theertha Suresh. Robust hypothesis testing and distribution estimation in Hellinger distance. InProceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 ofProceedings of Machine Learning Research, pages 2962–2970. PMLR, 2021

work page 2021
[19]

Torgersen.Comparison of Statistical Experiments

Erik N. Torgersen.Comparison of Statistical Experiments. Encyclopedia of Mathematics and its Applications, Vol. 36. Cambridge University Press, Cambridge, 1991. 17

work page 1991