arxiv: 2605.03839 · v1 · submitted 2026-05-05 · 💻 cs.DS · cs.LG· math.PR

Recognition: unknown

On Computing Total Variation Distance Between Mixtures of Product Distributions

Weiming Feng , Yucheng Fu , Minji Yang , Anqi Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-07 14:02 UTC · model grok-4.3

classification 💻 cs.DS cs.LGmath.PR

keywords total variation distancemixture of productsproduct distributionsapproximation algorithmsBoolean subcubes#P-hardnesscomputational complexity

0 comments

The pith

Mixtures of k product distributions over [q]^n have their total variation distance approximable to multiplicative (1 ± ε) by a randomized algorithm in poly((nq)^{k1+k2}, 1/ε) time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops computational methods to approximate or exactly compute the total variation distance between two mixtures of product distributions over a discrete n-dimensional domain. A randomized algorithm achieves multiplicative approximation in time polynomial in (nq) raised to the total number of components across both mixtures. This is useful because total variation distance quantifies how distinguishable two probabilistic models are, and product mixtures are standard for capturing dependencies in high-dimensional discrete data. For the special case of mixtures of Boolean subcubes, the authors provide a deterministic exact algorithm running in time polynomial in n but exponential in the number of components, together with a proof that exact computation is #P-hard once the total number of components grows linearly with the dimension.

Core claim

Given two mixtures P and Q consisting of k1 and k2 product distributions over [q]^n respectively, there exists a randomized algorithm that approximates d_TV(P, Q) to within multiplicative factor (1 ± ε) in time poly((nq)^{k1+k2}, 1/ε). For mixtures of Boolean subcubes over {0,1}^n, the total variation distance can be computed exactly by a deterministic algorithm in time poly(n, 2^{O(k1+k2)}), yet the problem is #P-hard whenever k1 + k2 = Θ(n).

What carries the argument

The randomized approximation procedure together with the deterministic exact procedure for subcube mixtures, both relying on explicit access to the individual product components of the input mixtures.

If this is right

When the total number of components is small relative to the dimension, the distance between two such mixtures becomes efficiently computable even for large n and q.
Exact numerical comparison is feasible for any fixed number of Boolean subcubes without incurring approximation error.
No polynomial-time exact algorithm for the subcube case can exist when the number of components scales linearly with n, unless #P collapses to P.
The approximation result immediately yields efficient procedures for deciding whether two mixture models are close or far in total variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same algorithmic template could be tested on other statistical distances such as Hellinger or chi-squared between the same class of mixtures.
In applications that fit mixture models to data, the procedure offers a way to compare two fitted models without generating large synthetic samples.
The exponential dependence on k in the subcube exact algorithm suggests a natural parameterization by the number of components that may admit fixed-parameter tractable extensions to other discrete domains.

Load-bearing premise

The mixtures are supplied as explicit lists of their k1 and k2 component product distributions so that each component can be enumerated or accessed directly.

What would settle it

A concrete mixture instance with small fixed k1 and k2 for which the randomized algorithm's output deviates from the true d_TV(P, Q) by more than a (1 + ε) factor, or an instance of subcube mixtures with k1 + k2 = Θ(n) that admits an exact polynomial-time solution.

read the original abstract

We study the problem of approximating the total variation distance between two mixtures of product distributions over an $n$-dimensional discrete domain. Given two mixtures $\mathbb{P}$ and $\mathbb{Q}$ with $k_1$ and $k_2$ product distributions over $[q]^n$, respectively, we give a randomized algorithm that approximates $d_{\mathrm{TV}}\left({\mathbb{P}},{\mathbb{Q}}\right)$ within a multiplicative error of $(1\pm \varepsilon)$ in time $\mathrm{poly}((nq)^{k_1+k_2},1/\varepsilon)$. We also study the special case of mixtures of Boolean subcubes over $\{0,1\}^n$. For this class, we give a deterministic algorithm that exactly computes the total variation distance in time $\mathrm{poly}(n,2^{O(k_1+k_2)})$, and show that exact computation is $\#\mathsf{P}$-hard when $k_1+k_2=\Theta(n)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a randomized multiplicative approximation for TV distance between mixtures of k product distributions in time poly((nq)^k, 1/ε), plus exact computation and #P-hardness for Boolean subcubes.

read the letter

The core contribution is an algorithm that approximates the total variation distance between two mixtures of product distributions to within (1±ε) in time polynomial in (nq) raised to the total number of components. For the special case of mixtures of Boolean subcubes they also give a deterministic exact algorithm running in poly(n, 2^O(k)) time and prove that exact computation is #P-hard once k grows linearly with n. These bounds are stated cleanly and the split between the general product case and the subcube case is useful. The approximation result appears to rest on enumerating the mixture components and then applying standard sampling or dynamic programming over the low-dimensional support, which is a natural approach when k is treated as a small parameter. The hardness result for exact computation when k = Θ(n) is consistent with known counting problems on high-dimensional structures. The input model is the standard explicit one: the component distributions are given directly, so the runtime dependence on (nq)^k is expected rather than surprising. No circularity or hidden fitting shows up in the claims. The main limitation is that the approximation is only multiplicative; when the true distance is very small the guarantee can still be loose in absolute terms, though that is typical for this style of result. The paper is aimed at researchers in computational learning theory and high-dimensional probability who need to compute or bound distances between structured distributions. If you work on mixture models or property testing for discrete distributions, the runtime and hardness statements are worth seeing. I would send it to peer review; the algorithmic claims are concrete enough that a referee can check the proofs and the literature comparison without it being a desk reject.

Referee Report

0 major / 3 minor

Summary. The manuscript studies the computational problem of finding the total variation distance between two mixtures of product distributions over [q]^n, where the mixtures have k1 and k2 components respectively. It claims a randomized algorithm that returns a (1±ε)-multiplicative approximation to d_TV(P,Q) in time poly((nq)^{k1+k2},1/ε). For the special case of mixtures of Boolean subcubes over {0,1}^n it claims a deterministic exact algorithm running in poly(n,2^{O(k1+k2)}) time together with a #P-hardness proof for exact computation when k1+k2=Θ(n).

Significance. If the algorithmic claims and hardness reduction hold, the work supplies the first non-trivial positive results for TV distance between structured mixtures when the number of components is small, a regime that arises in many learning applications. The deterministic exact algorithm for subcubes and the matching hardness result together give a clean complexity picture for that subclass. The results are grounded in standard enumeration and dynamic-programming techniques over low-dimensional supports and therefore appear technically plausible.

minor comments (3)

The input representation of the mixtures is never stated explicitly. The claimed runtimes presuppose that each of the k1+k2 product distributions is given by an explicit description (e.g., its probability vector or parameter list) whose size is polynomial in nq; this assumption should be stated in the preliminaries or introduction and the input size should be related to the running-time bound.
The abstract and introduction would benefit from a short comparison paragraph placing the new runtimes against the naïve O((nq)^{k1+k2}) enumeration baseline and against any prior work on TV distance for single product distributions.
Notation for the mixtures P and Q is introduced in the abstract but never formally defined in the body; a short “Notation and Preliminaries” subsection would remove ambiguity between the mixture measure and its component distributions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work, recognition of its significance as the first non-trivial results in this regime, and recommendation for minor revision. We appreciate the note that the claims appear technically plausible based on standard techniques.

Circularity Check

0 steps flagged

No significant circularity; algorithmic claims are self-contained

full rationale

The paper states explicit runtime bounds for a randomized multiplicative approximation algorithm and a deterministic exact algorithm for Boolean subcubes, along with a #P-hardness result. These are standard complexity-theoretic claims supported by algorithmic constructions and reductions that do not reduce by definition or self-citation to the target quantities themselves. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described results; the derivation chain relies on enumeration over mixture components and dynamic programming, which are independent of the final distance value.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard definitions and properties of total variation distance, mixture distributions, and product distributions from probability theory, with no free parameters or invented entities introduced.

axioms (1)

standard math Standard properties of total variation distance and mixture distributions hold as defined in probability theory.
The algorithm and hardness claims rely on these foundational definitions.

pith-pipeline@v0.9.0 · 5477 in / 1402 out tokens · 98042 ms · 2026-05-07T14:02:06.432444+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

164 extracted references · 7 canonical work pages

[1]

A Distributional-Lifting Theorem for

Guy Blanc and Jane Lange and Carmen Strassle and Li. A Distributional-Lifting Theorem for. COLT , series =
[2]

Lifting Uniform Learners via Distributional Decomposition , booktitle =

Guy Blanc and Jane Lange and Ali Malik and Li. Lifting Uniform Learners via Distributional Decomposition , booktitle =
[3]

ICML , series =

Chinmaya Kausik and Kevin Tan and Ambuj Tewari , title =. ICML , series =
[4]

Nearly tight sample complexity bounds for learning

Hassan Ashtiani and Shai Ben. Nearly tight sample complexity bounds for learning. NeurIPS , pages =
[5]

COLT , series =

Jerry Li and Ludwig Schmidt , title =. COLT , series =
[6]

COLT , series =

Prateek Jain and Sewoong Oh , title =. COLT , series =
[7]

Gordon and Erik Jahn and Bijan Mazaheri and Yuval Rabani and Leonard J

Spencer L. Gordon and Erik Jahn and Bijan Mazaheri and Yuval Rabani and Leonard J. Schulman , title =. COLT , series =
[8]

Mazaheri and Yuval Rabani and Leonard J

Spencer Gordon and Bijan H. Mazaheri and Yuval Rabani and Leonard J. Schulman , title =. COLT , series =
[9]

Servedio , title =

Jon Feldman and Ryan O'Donnell and Rocco A. Servedio , title =
[10]

STOC , pages =

Sitan Chen and Ankur Moitra , title =. STOC , pages =
[11]

Meel and Dimitrios Myrisiotis and Aduri Pavan and N

Arnab Bhattacharyya and Sutanu Gayen and Kuldeep S. Meel and Dimitrios Myrisiotis and Aduri Pavan and N. V. Vinodchandran , title =. ICLR , publisher =
[12]

ICML , series =

Jeongyeol Kwon and Yonathan Efroni and Constantine Caramanis and Shie Mannor , title =. ICML , series =
[13]

Constantinos Daskalakis and Nishanth Dikkala and Gautam Kamath , title =
[14]

Amirali Abdullah and Ravi Kumar and Andrew McGregor and Sergei Vassilvitskii and Suresh Venkatasubramanian , title =
[15]

Takafumi Kanamori and Taiji Suzuki and Masashi Sugiyama , title =
[16]

Computing Divergences between Discrete Decomposable Models , booktitle =

Loong Kuan Lee and Nico Piatkowski and Fran. Computing Divergences between Discrete Decomposable Models , booktitle =
[17]

Sreejith Sreekumar and Ziv Goldfeld , title =. J. Mach. Learn. Res. , volume =
[18]

Rubenstein and Olivier Bousquet and Josip Djolonga and Carlos Riquelme and Ilya O

Paul K. Rubenstein and Olivier Bousquet and Josip Djolonga and Carlos Riquelme and Ilya O. Tolstikhin , title =. NeurIPS , pages =
[19]

On the Estimation of

Barnab. On the Estimation of
[20]

Liu, Jingcheng and Sinclair, Alistair and Srivastava, Piyush , journal =. The
[21]

arXiv preprint arXiv:2404.09674 , year=

A circus of circuits: Connections between decision diagrams, circuits, and automata , author=. arXiv preprint arXiv:2404.09674 , year=

work page arXiv
[22]

arXiv preprint arXiv:2506.23456 , year=

Sampling and Identity-Testing Without Approximate Tensorization of Entropy , author=. arXiv preprint arXiv:2506.23456 , year=

work page arXiv
[23]

Antonio Blanca and Zongchen Chen and Daniel Stefankovic and Eric Vigoda , title =
[24]

Topics and Techniques in Distribution Testing:

Cl. Topics and Techniques in Distribution Testing:. Found. Trends Commun. Inf. Theory , volume =
[25]

arXiv preprint arXiv:2502.10527 , year=

Algorithms and Hardness for Estimating Statistical Similarity , author=. arXiv preprint arXiv:2502.10527 , year=

work page arXiv
[26]

Electron

Kontorovich, Aryeh , TITLE =. Electron. Commun. Probab. , FJOURNAL =. 2025 , PAGES =

2025
[27]

Arnab Bhattacharyya and Weiming Feng and Piyush Srivastava , title =
[28]

SODA , pages =

Xiaoyu Chen and Xinyuan Zhang , title =. SODA , pages =
[29]

STOC , pages =

Xiaoyu Chen and Zongchen Chen and Yitong Yin and Xinyuan Zhang , title =. STOC , pages =
[30]

STOC , pages =

Guy Bresler , title =. STOC , pages =
[31]

Ankur Moitra , title =. J
[32]

COLT , series =

Weiming Feng and Hongyang Liu and Minji Yang , title =. COLT , series =
[33]

ALT , series =

Aryeh Kontorovich and Ariel Avital , title =. ALT , series =
[34]

arXiv preprint arXiv:2409.10368 , year=

On the tensorization of the variational distance , author=. arXiv preprint arXiv:2409.10368 , year=

work page arXiv
[35]

FOCS , pages =

Xiaoyu Chen and Weiming Feng and Yitong Yin and Xinyuan Zhang , title =. FOCS , pages =
[36]

A Survey on Distribution Testing: Your Data is Big

Canonne, Cl. A Survey on Distribution Testing: Your Data is Big. But is it Blue? , year =. doi:10.4086/toc.gs.2020.009 , publisher =

work page doi:10.4086/toc.gs.2020.009 2020
[37]

Testing Probability Distributions Underlying Aggregated Data , booktitle =

Cl. Testing Probability Distributions Underlying Aggregated Data , booktitle =
[38]

ICML , series =

Jamil Arbas and Hassan Ashtiani and Christopher Liaw , title =. ICML , series =
[39]

arXiv preprint arXiv:1810.08693 , year=

The total variation distance between high-dimensional Gaussians with the same mean , author=. arXiv preprint arXiv:1810.08693 , year=

work page arXiv
[40]

ICALP , series =

Stefan Kiefer , title =. ICALP , series =
[41]

LICS , pages =

Taolue Chen and Stefan Kiefer , title =. LICS , pages =
[42]

Corinna Cortes and Mehryar Mohri and Ashish Rastogi , title =. Int. J. Found. Comput. Sci. , volume =
[43]

Henzinger and Jean

Laurent Doyen and Thomas A. Henzinger and Jean. Equivalence of Labeled Markov Chains , journal =
[44]

Murawski and Jo

Stefan Kiefer and Andrzej S. Murawski and Jo. Language Equivalence for Probabilistic Automata , booktitle =
[45]

2003 , publisher=

Counting, sampling and integrating: algorithms and complexity , author=. 2003 , publisher=

2003
[46]

SODA , pages =

Weiming Feng and Liqiang Liu and Tianren Liu , title =. SODA , pages =
[47]

Meel and Dimitrios Myrisiotis and A

Arnab Bhattacharyya and Sutanu Gayen and Kuldeep S. Meel and Dimitrios Myrisiotis and A. Pavan and N. V. Vinodchandran , title =. IJCAI , pages =
[48]

2017 , publisher=

Markov chains and mixing times , author=. 2017 , publisher=

2017
[49]

A complete problem for statistical zero knowledge , author=. J. 2003 , publisher=

2003
[50]

Andreas Galanis and Daniel Stefankovic and Eric Vigoda , title =. Comb. Probab. Comput. , volume =
[51]

FOCS , pages =

Allan Sly and Nike Sun , title =. FOCS , pages =
[52]

arXiv , volume =

Xiaoyu Chen and Weiming Feng and Heng Guo and Xinyuan Zhang and Zongrui Zou , title =. arXiv , volume =
[53]

TheoretiCS , volume =

Weiming Feng and Heng Guo and Mark Jerrum and Jiaheng Wang , title =. TheoretiCS , volume =
[54]

Meel and Dimitrios Myrisiotis and A

Arnab Bhattacharyya and Sutanu Gayen and Kuldeep S. Meel and Dimitrios Myrisiotis and A. Pavan and N. V. Vinodchandran , title =. ICML , publisher =
[55]

Meel and N

Arnab Bhattacharyya and Sutanu Gayen and Kuldeep S. Meel and N. V. Vinodchandran , title =. NeurIPS , year =
[56]

Dyer and Catherine S

Martin E. Dyer and Catherine S. Greenhill , title =. J. Algorithms , volume =
[57]

CoRR , volume =

Xiaoyu Chen and Zongchen Chen and Yitong Yin and Xinyuan Zhang , title =. CoRR , volume =
[58]

Weiming Feng and Heng Guo and Jiaheng Wang , title =. Inf. Comput. , volume =
[59]

Hayes , title =

Thomas P. Hayes , title =. FOCS , pages =
[60]

Amartya Shankha Biswas and Ronitt Rubinfeld and Anak Yodpinyanee , title =
[61]

Log-concave polynomials

Nima Anari and Kuikui Liu and Shayan. Log-concave polynomials. STOC , pages =
[62]

A power law of order

Long, Yun and Nachmias, Asaf and Ning, Weiyang and Peres, Yuval , fjournal =. A power law of order. Mem. Amer. Math. Soc. , number =
[63]

Entropy decay in the

Antonio Blanca and Pietro Caputo and Daniel Parisi and Alistair Sinclair and Eric Vigoda , booktitle =. Entropy decay in the
[64]

Modified log-

Cryan, Mary and Guo, Heng and Mousa, Giorgos , fjournal =. Modified log-. Ann. Probab. , number =
[65]

On Mixing of

Antonio Blanca and Pietro Caputo and Zongchen Chen and Daniel Parisi and Daniel. On Mixing of
[66]

Fortuin, C. M. and Kasteleyn, P. W. , fjournal =. On the random-cluster model. Physica , pages =
[67]

arXiv , title =

Nima Anari and Vishesh Jain and Frederic Koehler and Huy Tuan Pham and Thuy. arXiv , title =
[68]

The Complexity of Ferromagnetic

Leslie Ann Goldberg and Mark Jerrum , journal =. The Complexity of Ferromagnetic
[69]

and Sokal, Alan D

Edwards, Robert G. and Sokal, Alan D. , fjournal =. Generalization of the. Phys. Rev. D (3) , number =
[70]

Wilson , booktitle =

Dana Randall and David B. Wilson , booktitle =. Sampling Spin Configurations of an
[71]

Rapid Mixing

Sejun Park and Yunhun Jang and Andreas Galanis and Jinwoo Shin and Daniel. Rapid Mixing. AISTATS , pages =
[72]

Uniquely colourable graphs and the hardness of colouring graphs of large girth , volume =. Combin. Probab. Comput. , number =
[73]

Rapid Mixing from Spectral Independence beyond the

Weiming Feng and Heng Guo and Yitong Yin and Chihao Zhang , booktitle =. Rapid Mixing from Spectral Independence beyond the
[74]

Rapid Mixing for Colorings via Spectral Independence , year =

Zongchen Chen and Andreas Galanis and Daniel. Rapid Mixing for Colorings via Spectral Independence , year =
[75]

arXiv , author =:2107.03932 , journal =

Perfect sampling for (atomic). arXiv , author =:2107.03932 , journal =

work page arXiv
[76]

Coloring simple hypergraphs , volume =

Frieze, Alan and Mubayi, Dhruv , journal =. Coloring simple hypergraphs , volume =
[77]

Frieze and P

Alan M. Frieze and P. Randomly coloring simple hypergraphs , journal =
[78]

Nonuniversal critical dynamics in

Swendsen, Robert and Wang, Jian-Sheng , issue =. Nonuniversal critical dynamics in. Phys. Rev. Lett. , pages =
[79]

STOC , pages=

Sampling constraint satisfaction solutions in the local lemma regime , author=. STOC , pages=
[80]

On the sampling

Vishesh Jain and Huy Tuan Pham and Thuy Duong Vuong , journal =. On the sampling

Showing first 80 references.