Entropy Equivalence Testing

Cl\'ement L. Canonne; Jonathan Scarlett; Joy Qiping Yang; Yash Pote

arxiv: 2605.23225 · v1 · pith:DMQVQHYSnew · submitted 2026-05-22 · 💻 cs.DS · cs.DM· cs.IT· cs.LG· math.IT· math.ST· stat.TH

Entropy Equivalence Testing

Cl\'ement L. Canonne , Yash Pote , Jonathan Scarlett , Joy Qiping Yang This is my paper

Pith reviewed 2026-05-25 03:14 UTC · model grok-4.3

classification 💻 cs.DS cs.DMcs.ITcs.LGmath.ITmath.STstat.TH

keywords entropy equivalence testingcloseness testingsample complexityBayesian networksdistribution testingShannon entropyproperty testingalgorithmic efficiency

0 comments

The pith

Entropy equivalence testing distinguishes identical distributions from those with entropy difference at least epsilon using fewer samples than closeness testing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces entropy equivalence testing as a relaxation of closeness testing, where the task is to distinguish two unknown distributions p and q being identical from their Shannon entropies differing by at least epsilon. It supplies a time- and sample-efficient algorithm for this problem and establishes that the optimal sample complexity is significantly lower than what closeness testing requires. The result is applied to obtain the first non-trivial tester for whether two low-degree Bayesian networks are identical or differ substantially in total variation distance, improving on full-learning baselines. A sympathetic reader would care because the relaxation focuses computational effort on a single functional rather than the full distance, potentially making testing feasible for structured high-dimensional objects.

Core claim

The authors establish that entropy equivalence testing admits a time- and sample-efficient algorithm whose optimal sample complexity is significantly lower than that of closeness testing. They leverage this algorithm to produce the first non-trivial tester for standard closeness of low-degree Bayesian networks, which improves on the sample or time complexity of baselines that rely on full learning of the networks.

What carries the argument

The entropy equivalence tester, an algorithm that distinguishes p equals q from the absolute difference in Shannon entropy being at least epsilon, given samples from both distributions.

If this is right

The optimal sample complexity for entropy equivalence testing is significantly lower than for closeness testing.
The entropy equivalence tester yields the first non-trivial algorithm for closeness testing of low-degree Bayesian networks.
The new tester for Bayesian networks improves on either the sample or time complexity of full-learning baselines.
Relaxing the testing requirement to a single functional such as entropy can produce more efficient algorithms than testing full distance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relaxation strategy could be applied to other properties of distributions to obtain efficient testers for additional structured families.
Low-degree Bayesian networks may admit testers based on other simple functionals if the entropy approach generalizes.
The separation in sample complexity between property testing and equality testing may extend to other graphical models beyond Bayesian networks.

Load-bearing premise

That an algorithm distinguishing p equals q from absolute entropy difference at least epsilon can be directly leveraged to obtain a non-trivial tester for standard closeness on low-degree Bayesian networks without additional assumptions on the networks or the distance measure.

What would settle it

A concrete pair of low-degree Bayesian networks where the entropy equivalence tester outputs that the distributions are equivalent yet their total variation distance exceeds epsilon, or a family of distributions where the sample complexity of the entropy tester fails to beat known closeness testing bounds.

read the original abstract

We introduce the problem of \emph{entropy equivalence testing} for probability distributions, a relaxation of the well-studied closeness testing problem, where the distribution testing algorithm is now only required to distinguish, given samples from two unknown distributions $p,q$ and a parameter $\varepsilon \in(0,1/2]$, between $p=q$ and $|H(p)-H(q)| \geq \varepsilon$ (where $H$ denotes the Shannon entropy). We provide a time- and sample-efficient algorithm for this task, showing that the optimal sample complexity for this task can be significantly lower than that of closeness testing. As an application, we leverage this result to provide the first non-trivial testing algorithm for (standard) closeness of low-degree \emph{Bayesian networks}, which significantly improves on either the sample or time complexity of a baseline based on full learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Entropy equivalence testing is a new relaxation with a claimed complexity win over closeness testing, but the Bayesian network application rests on an unproven link from entropy difference to TV distance.

read the letter

The main things to know are that the paper defines entropy equivalence testing—distinguish p=q from |H(p)-H(q)| at least epsilon—and gives an algorithm with better sample complexity than standard closeness testing, then applies it to claim the first non-trivial closeness tester for low-degree Bayesian networks that beats full learning baselines. The new element is the problem definition itself plus the algorithm. The BN use case is presented as a concrete payoff for structured distributions. This is honest progress inside distribution testing. The algorithm is described as time- and sample-efficient, which is the part that looks like a real improvement on the surface. The application is positioned as non-trivial relative to prior closeness work. The soft spot is the bridge in the application. The entropy tester only handles entropy gaps. Turning it into a TV-based closeness tester requires that, inside low-degree Bayesian networks, large TV distance forces a noticeable entropy difference. The abstract gives no sketch of such a structural fact or auxiliary reduction, and we already know counterexamples exist for general discrete distributions. If the paper does not establish this for the BN class, the application claim does not go through. The rest of the abstract presents the work as independent rather than circular. This is for people working on property testing and testing for graphical models. A specialist reader would get value from the new variant and the complexity comparison. It deserves peer review so the reduction step and the bounds can be checked in detail.

Referee Report

2 major / 2 minor

Summary. The paper introduces entropy equivalence testing: given samples from unknown distributions p and q, distinguish p = q from |H(p) - H(q)| ≥ ε. It claims a time- and sample-efficient algorithm whose optimal sample complexity is significantly lower than that of standard closeness testing. As an application, the entropy tester is leveraged to obtain the first non-trivial tester for standard closeness (p = q vs. d_TV(p, q) ≥ ε) of low-degree Bayesian networks, improving on full-learning baselines in sample or time complexity.

Significance. If the entropy-equivalence algorithm is correct and the reduction to BN closeness testing holds, the work would supply a useful relaxation of closeness testing with improved complexity and the first non-trivial tester for a structured class of distributions. The manuscript positions the result as algorithmic rather than information-theoretic and claims concrete efficiency gains over learning-based approaches.

major comments (2)

[Application section / abstract] Application to BN closeness (abstract and § on applications): the entropy tester only distinguishes |H(p)-H(q)| ≥ ε; converting it into a tester for d_TV(p,q) ≥ ε requires a lemma showing that, inside the low-degree BN class, d_TV ≥ ε forces |H(p)-H(q)| ≥ δ(ε) > 0 (or an auxiliary reduction). No such structural property, lemma, or argument ruling out counter-examples with identical entropy but large TV distance is indicated.
[Algorithm section] § on entropy tester algorithm: the claimed sample-complexity improvement over closeness testing is stated as 'significantly lower,' but the manuscript must explicitly compare the derived bound to the known Θ(max(n^{2/3}/ε^{4/3}, √n/ε²)) closeness bound and show where the entropy relaxation yields the saving (e.g., via a concrete theorem statement).

minor comments (2)

[Introduction] Notation: H(p) is used for Shannon entropy without an explicit definition or base in the abstract; add a short reminder in the introduction.
[Application section] The baseline 'full learning' comparison for BNs should cite the precise sample complexity used (e.g., the learning bound for low-degree BNs) so the improvement is quantifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the presentation of the reduction and the complexity comparison.

read point-by-point responses

Referee: [Application section / abstract] Application to BN closeness (abstract and § on applications): the entropy tester only distinguishes |H(p)-H(q)| ≥ ε; converting it into a tester for d_TV(p,q) ≥ ε requires a lemma showing that, inside the low-degree BN class, d_TV ≥ ε forces |H(p)-H(q)| ≥ δ(ε) > 0 (or an auxiliary reduction). No such structural property, lemma, or argument ruling out counter-examples with identical entropy but large TV distance is indicated.

Authors: We agree that an explicit structural lemma is required to complete the reduction from closeness testing to entropy equivalence testing for low-degree Bayesian networks. The current manuscript states the application but does not include the supporting argument. In the revision we will add a dedicated lemma (with proof) in the applications section establishing that, for the class of low-degree BNs, d_TV(p, q) ≥ ε implies |H(p) − H(q)| ≥ δ(ε) for some δ(ε) > 0 that depends only on ε and the degree bound. The lemma will also rule out counter-examples with identical entropy but large TV distance by exploiting the conditional independence structure of low-degree networks. revision: yes
Referee: [Algorithm section] § on entropy tester algorithm: the claimed sample-complexity improvement over closeness testing is stated as 'significantly lower,' but the manuscript must explicitly compare the derived bound to the known Θ(max(n^{2/3}/ε^{4/3}, √n/ε²)) closeness bound and show where the entropy relaxation yields the saving (e.g., via a concrete theorem statement).

Authors: We accept the referee’s observation. While the manuscript derives the sample complexity of the entropy tester and asserts it is significantly lower than standard closeness testing, it does not contain a side-by-side theorem statement. In the revision we will insert a new theorem (immediately after the main entropy-tester result) that states the entropy-equivalence sample complexity, recalls the known Θ(max(n^{2/3}/ε^{4/3}, √n/ε²)) bound for closeness testing, and explicitly identifies the asymptotic regimes in which the entropy relaxation yields the improvement (primarily when the entropy gap can be detected with fewer samples than are needed to distinguish arbitrary pairs at TV distance ε). revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic result and application are independent constructions

full rationale

The paper defines entropy equivalence testing as a distinct relaxation of closeness testing and claims an efficient algorithm whose sample complexity is shown lower than standard closeness testing. The application to low-degree BN closeness testing is presented as a direct leverage of this algorithm without any reduction to fitted parameters, self-definitional equations, or load-bearing self-citations. No quoted steps equate a claimed prediction or uniqueness result to its own inputs by construction. The derivation chain remains self-contained as an algorithmic contribution; any potential gap in the entropy-to-TV bridge for BNs is a correctness or completeness issue, not circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5692 in / 1048 out tokens · 37001 ms · 2026-05-25T03:14:53.088642+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

Optimal Testing for Properties of Distributions

[ADK15] Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. “Optimal Testing for Properties of Distributions”. In:NeurIPS. 2015, pp. 3591–3599 (cit. on p. 2). [AOST15] Jayadev Acharya et al. “The Complexity of Estimating Rényi Entropy”. In:SODA. SIAM, 2015, pp. 1855–1869 (cit. on p. 5). [BCGM25] Arnab Bhattacharyya et al. “Learnability of Paramet...

work page 2015
[2]

Near-Optimal Learning of Tree-Structured Distributions by Chow and Liu

2025, pp. 21411–21432 (cit. on p. 6). [BGP+23] Arnab Bhattacharyya et al. “Near-Optimal Learning of Tree-Structured Distributions by Chow and Liu”. In:SIAM J. Comput.52.3 (2023), pp. 761–793 (cit. on p. 6). [Can20a] Clément L. Canonne.A short note on learning discrete distributions

work page 2025
[3]

arXiv: 2002.11457 [math.ST].url:https://arxiv.org/abs/2002.11457(cit. on p. 13). [Can20b] Clément L. Canonne.A Survey on Distribution Testing: Your Data is Big. But is it Blue?Graduate Surveys

work page arXiv 2002
[4]

Topics and Techniques in Distribution Testing: A Biased but Representative Sample

Theory of Computing Library, 2020, pp. 1–100.doi: 10.4086/toc.gs.2020.009(cit. on p. 1). [Can22] Clément L. Canonne. “Topics and Techniques in Distribution Testing: A Biased but Representative Sample”. In:Found. Trends Commun. Inf. Theory19.6 (2022), pp. 1032– 1198 (cit. on p. 1). [CDKS17] Clément L Canonne et al. “Testing Bayesian Networks”. In:Conferenc...

work page doi:10.4086/toc.gs.2020.009(cit 2020
[5]

PropertyTestinganditsConnection to Learning and Approximation

Proceedings of Machine Learning Research. PMLR, 2017, pp. 697–703 (cit. on p. 7). [GGR98] OdedGoldreich,ShafiGoldwasser,andDanaRon. “PropertyTestinganditsConnection to Learning and Approximation”. In:J. ACM45.4 (1998), pp. 653–750 (cit. on p. 1). [Gol17] Oded Goldreich.Introduction to Property Testing. Cambridge University Press, 2017 (cit. on p. 1). [JVH...

work page 2017

[1] [1]

Optimal Testing for Properties of Distributions

[ADK15] Jayadev Acharya, Constantinos Daskalakis, and Gautam Kamath. “Optimal Testing for Properties of Distributions”. In:NeurIPS. 2015, pp. 3591–3599 (cit. on p. 2). [AOST15] Jayadev Acharya et al. “The Complexity of Estimating Rényi Entropy”. In:SODA. SIAM, 2015, pp. 1855–1869 (cit. on p. 5). [BCGM25] Arnab Bhattacharyya et al. “Learnability of Paramet...

work page 2015

[2] [2]

Near-Optimal Learning of Tree-Structured Distributions by Chow and Liu

2025, pp. 21411–21432 (cit. on p. 6). [BGP+23] Arnab Bhattacharyya et al. “Near-Optimal Learning of Tree-Structured Distributions by Chow and Liu”. In:SIAM J. Comput.52.3 (2023), pp. 761–793 (cit. on p. 6). [Can20a] Clément L. Canonne.A short note on learning discrete distributions

work page 2025

[3] [3]

arXiv: 2002.11457 [math.ST].url:https://arxiv.org/abs/2002.11457(cit. on p. 13). [Can20b] Clément L. Canonne.A Survey on Distribution Testing: Your Data is Big. But is it Blue?Graduate Surveys

work page arXiv 2002

[4] [4]

Topics and Techniques in Distribution Testing: A Biased but Representative Sample

Theory of Computing Library, 2020, pp. 1–100.doi: 10.4086/toc.gs.2020.009(cit. on p. 1). [Can22] Clément L. Canonne. “Topics and Techniques in Distribution Testing: A Biased but Representative Sample”. In:Found. Trends Commun. Inf. Theory19.6 (2022), pp. 1032– 1198 (cit. on p. 1). [CDKS17] Clément L Canonne et al. “Testing Bayesian Networks”. In:Conferenc...

work page doi:10.4086/toc.gs.2020.009(cit 2020

[5] [5]

PropertyTestinganditsConnection to Learning and Approximation

Proceedings of Machine Learning Research. PMLR, 2017, pp. 697–703 (cit. on p. 7). [GGR98] OdedGoldreich,ShafiGoldwasser,andDanaRon. “PropertyTestinganditsConnection to Learning and Approximation”. In:J. ACM45.4 (1998), pp. 653–750 (cit. on p. 1). [Gol17] Oded Goldreich.Introduction to Property Testing. Cambridge University Press, 2017 (cit. on p. 1). [JVH...

work page 2017