Breaking the Finite-Sample Barrier in Entropy Coupling

Jun Chen; Shahab Asoodeh

arxiv: 2605.16229 · v1 · pith:JN7JPMFPnew · submitted 2026-05-15 · 💻 cs.IT · math.IT· math.ST· stat.ML· stat.TH

Breaking the Finite-Sample Barrier in Entropy Coupling

Shahab Asoodeh , Jun Chen This is my paper

Pith reviewed 2026-05-19 18:26 UTC · model grok-4.3

classification 💻 cs.IT math.ITmath.STstat.MLstat.TH

keywords entropy couplingconditional entropyfinite-sample recoverydiscrete distributionsdependent observationsrandomness extractionrepresentation learning

0 comments

The pith

Dependence among fixed-marginal observations can drive conditional entropy to zero after finitely many samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the minimum list entropy coupling as the smallest possible conditional entropy of a discrete random variable X given a collection of other variables whose marginal distributions are fixed in advance. It shows that when the collection is allowed to be arbitrarily dependent, the conditional entropy can reach exactly zero once the number of variables exceeds a finite threshold determined by the support of X. In the independent case the same quantity only decays exponentially. Under mild conditions on the supports this threshold is on the order of the logarithm of one over the smallest positive probability mass of X. The result is used to characterize exact recovery and exact extraction in two applied settings.

Core claim

The minimum list entropy coupling H(P || Q1, …, Qm) is defined as the infimum of H(X | Y1, …, Ym) over all joints that realize the prescribed marginals X ~ P and Yi ~ Qi. When the Yi are permitted to depend on one another while keeping each marginal fixed, this quantity reaches zero for sufficiently large but finite m. Necessary and sufficient conditions for the zero-entropy regime are given, together with the explicit bound m = O(log(1/Pmin)) under standard support assumptions on the discrete marginals.

What carries the argument

The minimum list entropy coupling, which minimizes conditional entropy while enforcing fixed marginals and allowing arbitrary dependence among the conditioning variables.

If this is right

Exact recovery of X becomes possible after a finite number of dependent samples whose marginals are prescribed.
The same construction yields exact randomness extraction once the zero-entropy regime is reached.
Distribution-matching representation learning admits finite-sample exact matching under the identified support conditions.
A greedy algorithm computes the minimum list entropy coupling with monotone approximation guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The finite-sample exactness result may extend to other inverse problems in which one observes several noisy versions of a discrete signal and can choose their joint law.
In practice one would need to verify whether the required dependence among observations can be realized by a physical or algorithmic channel.
The O(log(1/Pmin)) scaling suggests that the number of observations needed grows slowly with the dynamic range of the distribution.

Load-bearing premise

The Y variables can be coupled in any way that preserves each individual marginal distribution exactly.

What would settle it

A concrete discrete distribution P together with marginals Qi for which the support conditions hold, yet H(X | Y1, …, Ym) remains strictly positive for every finite m and every valid joint.

Figures

Figures reproduced from arXiv: 2605.16229 by Jun Chen, Shahab Asoodeh.

**Figure 1.** Figure 1: Hm(P∥Q) in the Bernoulli case with (left) P = Ber(0.45) and Q = Ber(0.2) and (right) P = Ber(0.3) and Q = Ber(0.05). The exact value is due to Proposition 2 and the upper bound and lower bound are from Theorem 1 and Theorem 7. The blue curve “Estimate” is the output of Algorithm 1. The vertical line refers to the upper bound on m0(P∥Q) from Theorem 2. For the right figure, this bound is m0(P∥Q) ≤ 56. Thus,… view at source ↗

**Figure 2.** Figure 2: Initialization controls convergence. For two nonbinary instances, we plot the output [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Marginal-constrained representation learning. A stochastic encoder [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Hierarchy of coupling structures for finite-sample exact recovery. Independent and conditionally i.i.d. couplings are highly restricted. [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Binary validation against the closed-form solution. Each cell shows the mean absolute error of the monotone greedy estimate of [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 6.** Figure 6: Nonbinary bound sandwich under asymmetric marginals. We plot the monotone greedy estimate of [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Initialization matters. For two nonbinary instances, we compare Algorithm [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

read the original abstract

Dependence among marginally constrained observations can break a finite-sample barrier. To formalize this phenomenon, we introduce the \emph{minimum list entropy coupling} $H(P\|Q_1,\dots,Q_m)$, the minimum conditional entropy $H(X|Y_1,\dots,Y_m)$ over all joint distributions with prescribed discrete marginals $X\sim P$ and $Y_i\sim Q_i$. Unlike classical formulations based on independent observations, our model allows $Y_1,\dots,Y_m$ to be arbitrarily dependent while keeping each marginal fixed. This enlarged coupling space reveals a sharp dichotomy: independent observations reduce residual uncertainty exponentially, whereas dependent observations can eliminate it exactly after finitely many samples. We characterize this zero-entropy regime through necessary and sufficient conditions and give concrete structural criteria under which it occurs. In particular, under mild support assumptions, zero entropy is achieved with $O(\log(1/P_{\min}))$ observations, where $P_{\min}$ is the minimum nonzero mass of $P$. We also develop a greedy algorithm with monotone approximation guarantees for computing $H(P\|Q_1,\dots,Q_m)$. Finally, we show that the same framework formalizes finite-sample limits in distribution-matching representation learning and randomness extraction, where zero entropy corresponds to exact recovery and exact extraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core advance is a minimum list entropy coupling that uses dependent observations to hit exact zero conditional entropy in O(log(1/P_min)) samples under mild support conditions, with a clean dichotomy versus the independent case.

read the letter

The main thing to know is that this work formalizes how dependence among observations with fixed marginals can drive conditional entropy to exactly zero after finitely many samples instead of just decaying exponentially. They define the minimum list entropy coupling as the smallest H(X|Y1..Ym) over all joints that keep the marginals exactly P and Qi. This enlarged space produces a sharp split: independent Y's give the usual exponential reduction, while dependent ones can eliminate residual uncertainty completely in finite time. Under mild support assumptions they bound the number of observations by O(log(1/P_min)), give necessary and sufficient conditions plus structural criteria for the zero-entropy regime, and supply a greedy algorithm with monotone approximation guarantees. The same setup directly models exact recovery in distribution-matching representation learning and exact extraction in randomness generation. The framework stays self-contained and the citations look standard rather than circular. The math rests on ordinary discrete probability and information-theoretic arguments, which is appropriate for the setting. One spot worth checking in the proofs is whether the zero-entropy characterization fully incorporates the linear constraints that arise when the probability masses are linearly independent over the rationals. The stress-test note points out that even large supports may not suffice if the target vector P lies outside the inducible polytope; the paper's claim of necessary and sufficient conditions suggests the structural criteria address this, but it is worth confirming the algebraic condition is not left implicit. This is aimed at information theorists and statistical learning researchers who work on finite-sample coupling or exact matching problems. A reader already comfortable with entropy coupling and polytope arguments will get the most out of it. The idea is specific enough and the claims are testable enough that it deserves a serious referee.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the minimum list entropy coupling H(P || Q_1, …, Q_m) as the infimum of H(X | Y_1, …, Y_m) over all joints with prescribed marginals X ∼ P and Y_i ∼ Q_i, where the Y_i may be arbitrarily dependent. It claims necessary-and-sufficient conditions for the zero-entropy regime, shows that under mild support assumptions this regime is attained with O(log(1/P_min)) observations, supplies a greedy algorithm with monotone approximation guarantees, and applies the framework to exact recovery in distribution-matching representation learning and randomness extraction.

Significance. If the central claims are correct, the work is significant: it supplies a new coupling notion that separates the effect of dependence from independence, demonstrates finite-sample exact recovery that is impossible under the classical independent-observation model, and gives a concrete algorithmic tool together with falsifiable structural criteria. The introduction of an invented entity (minimum list entropy coupling) with explicit characterizations is a clear strength.

major comments (2)

[§3] §3 (characterization of zero-entropy regime): the necessary-and-sufficient conditions are stated in terms of support size and marginal constraints, yet the polytope of measures inducible by couplings of the Q_i may be empty when the masses of P and the Q_i are linearly independent over ℚ. This algebraic obstruction is not implied by the paper’s mild support assumptions alone and directly affects whether H(P || Q_1, …, Q_m) = 0 is attainable for any finite m.
[Abstract and §4] Abstract and §4 (O(log(1/P_min)) scaling): the claimed sample complexity is presented as following from the zero-entropy characterization, but the derivation appears to presuppose a specific choice of the Q_i that realizes the coupling; without an explicit construction or bound on how the Q_i must be chosen, the scaling claim is not yet load-bearing.

minor comments (2)

[Notation] The notation H(P || Q_1, …, Q_m) is introduced without an immediate comparison to the classical conditional entropy H(X | Y) when the Y_i are forced to be independent; a short clarifying sentence would help readers.
[Algorithm description] The greedy algorithm’s monotone guarantee is asserted but the proof is only sketched; adding the explicit submodularity or matroid argument used would strengthen the algorithmic section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major comment below and indicate planned revisions to improve clarity and rigor.

read point-by-point responses

Referee: [§3] §3 (characterization of zero-entropy regime): the necessary-and-sufficient conditions are stated in terms of support size and marginal constraints, yet the polytope of measures inducible by couplings of the Q_i may be empty when the masses of P and the Q_i are linearly independent over ℚ. This algebraic obstruction is not implied by the paper’s mild support assumptions alone and directly affects whether H(P || Q_1, …, Q_m) = 0 is attainable for any finite m.

Authors: We appreciate the referee for identifying this algebraic subtlety. The definition of H(P || Q_1, …, Q_m) is the infimum of H(X | Y_1, …, Y_m) over the (possibly empty) set of joints with the prescribed marginals; when no such joint exists the quantity is undefined. Our necessary-and-sufficient conditions in §3 are stated under the standing assumption that at least one coupling exists. The mild support assumptions in the paper are used to guarantee existence for the specific family of Q_i constructed later in the manuscript. To make the presentation fully rigorous we will add an explicit preliminary remark in §3 stating that the zero-entropy regime presupposes non-emptiness of the coupling polytope (i.e., linear dependence over ℚ of the relevant probability masses is required for compatibility). This clarification does not alter the main theorems but removes any ambiguity. revision: yes
Referee: [Abstract and §4] Abstract and §4 (O(log(1/P_min)) scaling): the claimed sample complexity is presented as following from the zero-entropy characterization, but the derivation appears to presuppose a specific choice of the Q_i that realizes the coupling; without an explicit construction or bound on how the Q_i must be chosen, the scaling claim is not yet load-bearing.

Authors: The O(log(1/P_min)) bound is derived from an explicit, constructive choice of the dependent marginals Q_i that satisfy the support conditions of the zero-entropy characterization. Section 4 presents a greedy procedure that, at each step, selects a Q_i whose support is a suitable subset of the atoms of P; the number of steps until the residual entropy reaches zero is then bounded by O(log(1/P_min)) under the stated mild support assumptions. We agree that the dependence on this particular construction could be stated more prominently. In the revision we will (i) isolate the construction as a standalone lemma in §4 and (ii) add a short paragraph in the abstract and introduction that explicitly ties the sample-complexity claim to this construction, thereby making the scaling self-contained. revision: yes

Circularity Check

0 steps flagged

No significant circularity; self-contained new definition and characterizations

full rationale

The paper defines the minimum list entropy coupling H(P||Q1,...,Qm) as the minimum conditional entropy over joints with fixed marginals, then derives necessary and sufficient conditions for the zero-entropy regime directly from this enlarged coupling space and the resulting polytope of inducible measures. The O(log(1/P_min)) bound and greedy algorithm follow from these structural criteria under the stated mild support assumptions, without any reduction to fitted parameters, self-citations, or prior results by the same authors. The derivation chain is therefore independent and self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard discrete probability axioms and the definition of conditional entropy over an enlarged joint space; no free parameters are introduced, and the only invented entity is the minimum list entropy coupling itself, which is defined directly from the marginal constraints.

axioms (2)

domain assumption Discrete probability measures with finite support and positive minimum mass P_min
Invoked when stating the O(log(1/P_min)) scaling and zero-entropy regime under mild support assumptions.
standard math Existence of joint distributions realizing any prescribed marginals
Standard result from probability theory used to define the coupling space.

invented entities (1)

Minimum list entropy coupling H(P||Q1,...,Qm) no independent evidence
purpose: Quantifies minimum conditional entropy over joints with fixed marginals allowing arbitrary dependence
Newly defined quantity that formalizes the enlarged coupling space; no independent evidence outside the definition is provided.

pith-pipeline@v0.9.0 · 5764 in / 1506 out tokens · 26073 ms · 2026-05-19T18:26:47.782679+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Lindvall,Lectures on the Coupling Method

T. Lindvall,Lectures on the Coupling Method. Dover Publications, 2002

work page 2002
[2]

A metric between probability distributions on finite sets of different cardinalities,

M. Vidyasagar, “A metric between probability distributions on finite sets of different cardinalities,”IEEE Transactions on Information Theory, vol. 58, no. 1, pp. 334–345, 2012

work page 2012
[3]

How to find a joint probability distribution of minimum entropy (almost) given the marginals,

F. Cicalese, L. Gargano, and U. Vaccaro, “How to find a joint probability distribution of minimum entropy (almost) given the marginals,” in2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 2173–2177

work page 2017
[4]

Minimum-entropy couplings and their applications,

——, “Minimum-entropy couplings and their applications,”IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3436–3451, 2019

work page 2019
[5]

Computing low-entropy couplings for large-support distributions,

S. Sokota, D. Sam, C. Schroeder de Witt, S. Compton, J. Foerster, and J. Z. Kolter, “Computing low-entropy couplings for large-support distributions,” inProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, ser. Proceedings of Machine Learning Research, N. Kiyavash and J. M. Mooij, Eds., vol. 244, 15–19 Jul 2024, pp. 3279–3298

work page 2024
[6]

Information spectrum converse for minimum entropy couplings and functional representations,

Y . Y . Shkel and A. Kumar Yadav, “Information spectrum converse for minimum entropy couplings and functional representations,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 66–71

work page 2023
[7]

Minimum entropy coupling with bottleneck,

M. Ebrahimi, J. Chen, and A. J. Khisti, “Minimum entropy coupling with bottleneck,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=YlmYm7sHDE

work page 2024
[8]

Memoryless representation of markov processes,

A. Painsky, S. Rosset, and M. Feder, “Memoryless representation of markov processes,” in2013 IEEE International Symposium on Information Theory (ISIT). IEEE, 2013, pp. 2294–2298

work page 2013
[9]

Asymptotic coupling and its applications in information theory,

L. Yu and V . Y . F. Tan, “Asymptotic coupling and its applications in information theory,”IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1321–1344, 2019

work page 2019
[10]

Greedy additive approximation algorithms for minimum-entropy coupling problem,

M. Rossi, “Greedy additive approximation algorithms for minimum-entropy coupling problem,” in2019 IEEE International Symposium on Information Theory (ISIT). IEEE, 2019, pp. 1127–1131

work page 2019
[11]

On the entropy of couplings,

M. Kova ˇcevi´c, I. Stanojevi ´c, and V . Šenk, “On the entropy of couplings,”Information and Computation, vol. 242, pp. 369–382, 2015

work page 2015
[12]

Entropic causal inference,

M. Kocaoglu, A. G. Dimakis, S. Vishwanath, and B. Hassibi, “Entropic causal inference,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, 2017

work page 2017
[13]

Entropic causality and greedy minimum entropy coupling,

——, “Entropic causality and greedy minimum entropy coupling,” in2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 1465–1469

work page 2017
[14]

Quantum entropic causal inference,

M. A. Javidian, V . Aggarwal, F. Bao, and Z. Jacob, “Quantum entropic causal inference,” inQuantum Information and Measurement VI 2021. Optica Publishing Group, 2021, p. F2C.3. [Online]. Available: https://opg.optica.org/abstract.cfm?URI=QIM-2021-F2C.3

work page 2021
[15]

Learning to match unpaired data with minimum entropy coupling,

M. Bounoua, G. Franzese, and P. Michiardi, “Learning to match unpaired data with minimum entropy coupling,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=lWcM04ExOD

work page 2025
[16]

Fundamental limits of perfect concept erasure,

S. B. R. Chowdhury, A. Dubey, A. Beirami, R. Kidambi, N. Monath, A. Ahmed, and S. Chaturvedi, “Fundamental limits of perfect concept erasure,” inProceedings of The 28th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, vol. 258, 2025, pp. 901–909. [Online]. Available: https://proceedings.mlr...

work page 2025
[17]

Perfectly secure steganography using minimum entropy coupling,

C. Schroeder de Witt, S. Sokota, M. Strohmeier, Z. Kolter, and J. Foerster, “Perfectly secure steganography using minimum entropy coupling,” in International Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.14889

work page arXiv 2023
[18]

Minimum entropy coupling and its applications,

F. Cicalese, L. Gargano, and U. Vaccaro, “Minimum entropy coupling and its applications,” inProceedings of the IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 1479–1483. 9

work page 2017
[19]

Communicating via Markov decision processes,

S. Sokota, C. A. S. De Witt, M. Igl, L. M. Zintgraf, P. Torr, M. Strohmeier, Z. Kolter, S. Whiteson, and J. Foerster, “Communicating via Markov decision processes,” inProceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, ...

work page 2022
[20]

Efficient approximate minimum entropy coupling of multiple probability distributions,

C. T. Li, “Efficient approximate minimum entropy coupling of multiple probability distributions,”IEEE Transactions on Information Theory, vol. 67, no. 8, pp. 5259–5279, 2021, arXiv:2006.07955

work page arXiv 2021
[21]

A tighter approximation guarantee for greedy minimum entropy coupling,

S. Compton, “A tighter approximation guarantee for greedy minimum entropy coupling,” in2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 168–173

work page 2022
[22]

Minimum-entropy coupling approximation guarantees beyond the majorization barrier,

S. Compton, D. Katz, B. Qi, K. Greenewald, and M. Kocaoglu, “Minimum-entropy coupling approximation guarantees beyond the majorization barrier,” inProceedings of The 26th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, F. Ruiz, J. Dy, and J.-W. van de Meent, Eds., vol. 206, 25–27 Apr 2023,...

work page 2023
[23]

Various techniques used in connection with random digits,

J. von Neumann, “Various techniques used in connection with random digits,” inCollected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis. New York: Pergamon Press, 1963, summary written by George E. Forsythe; originally 1951

work page 1963
[24]

The efficient construction of an unbiased random sequence,

P. Elias, “The efficient construction of an unbiased random sequence,”The Annals of Mathematical Statistics, vol. 43, no. 3, pp. 865–870, 1972

work page 1972
[25]

Iterating von neumann’s procedure for extracting random bits,

Y . Peres, “Iterating von neumann’s procedure for extracting random bits,”The Annals of Statistics, vol. 20, no. 1, pp. 590–597, 1992

work page 1992
[26]

How to share a secret,

A. Shamir, “How to share a secret,”Communications of the ACM, vol. 22, no. 11, pp. 612–613, 1979. APPENDIXA MOREDETAILS ONMARGINALLY-CONSTRAINEDREPRESENTATIONLEARNING We consider a representation learning problem in which the learned representation must satisfy prescribed marginal constraints. LetX∼Pbe a discrete source, and let an encoder mapXto a repres...

work page 1979
[27]

Proof.For simplicity assume thatn= m 2ℓ is an integer

There exists m∗ ≤2ℓ log(8/Pmin) −log(1−reqmin) such thatδ m(P∥Q) = 0for allm≥m ∗. Proof.For simplicity assume thatn= m 2ℓ is an integer. Apply the coupling constructed in the proof of Proposition 3. For m= 2nℓ, this yields a deterministic functionfsuch that P(X̸=f(Y m))≤δ n, δ n ≤2(1−req min)n. Sincem= 2nℓ, this gives δm(P∥Q)≤2 exp − m 2ℓ · −log(1−req min...

work page
[28]

The variablesY 1, . . . , Ym ∼Ber(q) can collectively allocate at mostmqmass to the event{Y m ̸= 0}, since1{Y m ̸= 0} ≤ Pm i=1 Yi and thus the nonzero configurations ofY m provide at mostmqtotal mass on whichX= 1can be encoded deterministically away from the all-zero atom. To recoverX∼Ber(p)exactly, the coupling must allocate total masspto the event{X= 1}...

work page
[29]

Hence H1(Ber(p)∥Ber(q)) = (1−q)h b p−q 1−q whenp > q

Sinceh b is increasing on[0, 1 2], it follows that hb p−q 1−q ≤h b 1−p−q 1−q =h b p 1−q . Hence H1(Ber(p)∥Ber(q)) = (1−q)h b p−q 1−q whenp > q. Combining the two cases, for all0≤p, q≤ 1 2, H1(Ber(p)∥Ber(q)) =    q hb p q , p≤q, (1−q)h b p−q 1−q , p > q. APPENDIXD PROOFS OFRESULTS INSECTIONIII We begin by proving Theorem 1. However, instead of provi...

work page
[30]

, Ym ∼Qi.i.d

Independent coupling:Y 1, . . . , Ym ∼Qi.i.d. and independent ofX

work page
[31]

coupling:Y i|X=x∼K x i.i.d

Conditionally i.i.d. coupling:Y i|X=x∼K x i.i.d. acrossiwith kernels{K x}x∈X

work page
[32]

Conditionally independent coupling:Y i|X=x∼K i,x independently acrossi, but not necessarily identically distributed with kernels{K i,x}i∈[m],x∈X

work page
[33]

These classes form a strict hierarchy, and, as we show, each step unlocks a fundamentally different mechanism for recovering X

General coupling: arbitrary joint distribution with marginalsY i ∼Q. These classes form a strict hierarchy, and, as we show, each step unlocks a fundamentally different mechanism for recovering X. Independent coupling:IfY m ∼Q ⊗m is independent ofX∼P, thenY m carries no information aboutX, and hence H(X|Y m) =H(P)for allm. In this regime, increasingmonly ...

work page
[34]

Under general couplings,H 2(P∥Q) = 0

work page
[35]

covered” part and a “miss

Under any conditionally independent coupling with marginalsY 1, Y2 ∼Q, we haveH(X|Y 1, Y2)>0. Proof.For the general-coupling claim, let(Y 1, Y2)have joint law Pr(Y1 = 0, Y2 = 0) = 1 3 ,Pr(Y 1 = 0, Y2 = 1) = Pr(Y1 = 1, Y2 = 0) = 1 6 ,Pr(Y 1 = 1, Y2 = 1) = 1 3 . 23 ThenY 1, Y2 ∼Ber(1/2). DefineXdeterministically by X=    1,(Y 1, Y2) = (0,0), 2,(Y 1, Y2...

work page

[1] [1]

Lindvall,Lectures on the Coupling Method

T. Lindvall,Lectures on the Coupling Method. Dover Publications, 2002

work page 2002

[2] [2]

A metric between probability distributions on finite sets of different cardinalities,

M. Vidyasagar, “A metric between probability distributions on finite sets of different cardinalities,”IEEE Transactions on Information Theory, vol. 58, no. 1, pp. 334–345, 2012

work page 2012

[3] [3]

How to find a joint probability distribution of minimum entropy (almost) given the marginals,

F. Cicalese, L. Gargano, and U. Vaccaro, “How to find a joint probability distribution of minimum entropy (almost) given the marginals,” in2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 2173–2177

work page 2017

[4] [4]

Minimum-entropy couplings and their applications,

——, “Minimum-entropy couplings and their applications,”IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3436–3451, 2019

work page 2019

[5] [5]

Computing low-entropy couplings for large-support distributions,

S. Sokota, D. Sam, C. Schroeder de Witt, S. Compton, J. Foerster, and J. Z. Kolter, “Computing low-entropy couplings for large-support distributions,” inProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, ser. Proceedings of Machine Learning Research, N. Kiyavash and J. M. Mooij, Eds., vol. 244, 15–19 Jul 2024, pp. 3279–3298

work page 2024

[6] [6]

Information spectrum converse for minimum entropy couplings and functional representations,

Y . Y . Shkel and A. Kumar Yadav, “Information spectrum converse for minimum entropy couplings and functional representations,” in2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 66–71

work page 2023

[7] [7]

Minimum entropy coupling with bottleneck,

M. Ebrahimi, J. Chen, and A. J. Khisti, “Minimum entropy coupling with bottleneck,” inThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Online]. Available: https://openreview.net/forum?id=YlmYm7sHDE

work page 2024

[8] [8]

Memoryless representation of markov processes,

A. Painsky, S. Rosset, and M. Feder, “Memoryless representation of markov processes,” in2013 IEEE International Symposium on Information Theory (ISIT). IEEE, 2013, pp. 2294–2298

work page 2013

[9] [9]

Asymptotic coupling and its applications in information theory,

L. Yu and V . Y . F. Tan, “Asymptotic coupling and its applications in information theory,”IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1321–1344, 2019

work page 2019

[10] [10]

Greedy additive approximation algorithms for minimum-entropy coupling problem,

M. Rossi, “Greedy additive approximation algorithms for minimum-entropy coupling problem,” in2019 IEEE International Symposium on Information Theory (ISIT). IEEE, 2019, pp. 1127–1131

work page 2019

[11] [11]

On the entropy of couplings,

M. Kova ˇcevi´c, I. Stanojevi ´c, and V . Šenk, “On the entropy of couplings,”Information and Computation, vol. 242, pp. 369–382, 2015

work page 2015

[12] [12]

Entropic causal inference,

M. Kocaoglu, A. G. Dimakis, S. Vishwanath, and B. Hassibi, “Entropic causal inference,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 31, 2017

work page 2017

[13] [13]

Entropic causality and greedy minimum entropy coupling,

——, “Entropic causality and greedy minimum entropy coupling,” in2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 1465–1469

work page 2017

[14] [14]

Quantum entropic causal inference,

M. A. Javidian, V . Aggarwal, F. Bao, and Z. Jacob, “Quantum entropic causal inference,” inQuantum Information and Measurement VI 2021. Optica Publishing Group, 2021, p. F2C.3. [Online]. Available: https://opg.optica.org/abstract.cfm?URI=QIM-2021-F2C.3

work page 2021

[15] [15]

Learning to match unpaired data with minimum entropy coupling,

M. Bounoua, G. Franzese, and P. Michiardi, “Learning to match unpaired data with minimum entropy coupling,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=lWcM04ExOD

work page 2025

[16] [16]

Fundamental limits of perfect concept erasure,

S. B. R. Chowdhury, A. Dubey, A. Beirami, R. Kidambi, N. Monath, A. Ahmed, and S. Chaturvedi, “Fundamental limits of perfect concept erasure,” inProceedings of The 28th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, vol. 258, 2025, pp. 901–909. [Online]. Available: https://proceedings.mlr...

work page 2025

[17] [17]

Perfectly secure steganography using minimum entropy coupling,

C. Schroeder de Witt, S. Sokota, M. Strohmeier, Z. Kolter, and J. Foerster, “Perfectly secure steganography using minimum entropy coupling,” in International Conference on Learning Representations (ICLR), 2023. [Online]. Available: https://arxiv.org/abs/2210.14889

work page arXiv 2023

[18] [18]

Minimum entropy coupling and its applications,

F. Cicalese, L. Gargano, and U. Vaccaro, “Minimum entropy coupling and its applications,” inProceedings of the IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 1479–1483. 9

work page 2017

[19] [19]

Communicating via Markov decision processes,

S. Sokota, C. A. S. De Witt, M. Igl, L. M. Zintgraf, P. Torr, M. Strohmeier, Z. Kolter, S. Whiteson, and J. Foerster, “Communicating via Markov decision processes,” inProceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, ...

work page 2022

[20] [20]

Efficient approximate minimum entropy coupling of multiple probability distributions,

C. T. Li, “Efficient approximate minimum entropy coupling of multiple probability distributions,”IEEE Transactions on Information Theory, vol. 67, no. 8, pp. 5259–5279, 2021, arXiv:2006.07955

work page arXiv 2021

[21] [21]

A tighter approximation guarantee for greedy minimum entropy coupling,

S. Compton, “A tighter approximation guarantee for greedy minimum entropy coupling,” in2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 168–173

work page 2022

[22] [22]

Minimum-entropy coupling approximation guarantees beyond the majorization barrier,

S. Compton, D. Katz, B. Qi, K. Greenewald, and M. Kocaoglu, “Minimum-entropy coupling approximation guarantees beyond the majorization barrier,” inProceedings of The 26th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, F. Ruiz, J. Dy, and J.-W. van de Meent, Eds., vol. 206, 25–27 Apr 2023,...

work page 2023

[23] [23]

Various techniques used in connection with random digits,

J. von Neumann, “Various techniques used in connection with random digits,” inCollected Works, Volume V: Design of Computers, Theory of Automata and Numerical Analysis. New York: Pergamon Press, 1963, summary written by George E. Forsythe; originally 1951

work page 1963

[24] [24]

The efficient construction of an unbiased random sequence,

P. Elias, “The efficient construction of an unbiased random sequence,”The Annals of Mathematical Statistics, vol. 43, no. 3, pp. 865–870, 1972

work page 1972

[25] [25]

Iterating von neumann’s procedure for extracting random bits,

Y . Peres, “Iterating von neumann’s procedure for extracting random bits,”The Annals of Statistics, vol. 20, no. 1, pp. 590–597, 1992

work page 1992

[26] [26]

How to share a secret,

A. Shamir, “How to share a secret,”Communications of the ACM, vol. 22, no. 11, pp. 612–613, 1979. APPENDIXA MOREDETAILS ONMARGINALLY-CONSTRAINEDREPRESENTATIONLEARNING We consider a representation learning problem in which the learned representation must satisfy prescribed marginal constraints. LetX∼Pbe a discrete source, and let an encoder mapXto a repres...

work page 1979

[27] [27]

Proof.For simplicity assume thatn= m 2ℓ is an integer

There exists m∗ ≤2ℓ log(8/Pmin) −log(1−reqmin) such thatδ m(P∥Q) = 0for allm≥m ∗. Proof.For simplicity assume thatn= m 2ℓ is an integer. Apply the coupling constructed in the proof of Proposition 3. For m= 2nℓ, this yields a deterministic functionfsuch that P(X̸=f(Y m))≤δ n, δ n ≤2(1−req min)n. Sincem= 2nℓ, this gives δm(P∥Q)≤2 exp − m 2ℓ · −log(1−req min...

work page

[28] [28]

The variablesY 1, . . . , Ym ∼Ber(q) can collectively allocate at mostmqmass to the event{Y m ̸= 0}, since1{Y m ̸= 0} ≤ Pm i=1 Yi and thus the nonzero configurations ofY m provide at mostmqtotal mass on whichX= 1can be encoded deterministically away from the all-zero atom. To recoverX∼Ber(p)exactly, the coupling must allocate total masspto the event{X= 1}...

work page

[29] [29]

Hence H1(Ber(p)∥Ber(q)) = (1−q)h b p−q 1−q whenp > q

Sinceh b is increasing on[0, 1 2], it follows that hb p−q 1−q ≤h b 1−p−q 1−q =h b p 1−q . Hence H1(Ber(p)∥Ber(q)) = (1−q)h b p−q 1−q whenp > q. Combining the two cases, for all0≤p, q≤ 1 2, H1(Ber(p)∥Ber(q)) =    q hb p q , p≤q, (1−q)h b p−q 1−q , p > q. APPENDIXD PROOFS OFRESULTS INSECTIONIII We begin by proving Theorem 1. However, instead of provi...

work page

[30] [30]

, Ym ∼Qi.i.d

Independent coupling:Y 1, . . . , Ym ∼Qi.i.d. and independent ofX

work page

[31] [31]

coupling:Y i|X=x∼K x i.i.d

Conditionally i.i.d. coupling:Y i|X=x∼K x i.i.d. acrossiwith kernels{K x}x∈X

work page

[32] [32]

Conditionally independent coupling:Y i|X=x∼K i,x independently acrossi, but not necessarily identically distributed with kernels{K i,x}i∈[m],x∈X

work page

[33] [33]

These classes form a strict hierarchy, and, as we show, each step unlocks a fundamentally different mechanism for recovering X

General coupling: arbitrary joint distribution with marginalsY i ∼Q. These classes form a strict hierarchy, and, as we show, each step unlocks a fundamentally different mechanism for recovering X. Independent coupling:IfY m ∼Q ⊗m is independent ofX∼P, thenY m carries no information aboutX, and hence H(X|Y m) =H(P)for allm. In this regime, increasingmonly ...

work page

[34] [34]

Under general couplings,H 2(P∥Q) = 0

work page

[35] [35]

covered” part and a “miss

Under any conditionally independent coupling with marginalsY 1, Y2 ∼Q, we haveH(X|Y 1, Y2)>0. Proof.For the general-coupling claim, let(Y 1, Y2)have joint law Pr(Y1 = 0, Y2 = 0) = 1 3 ,Pr(Y 1 = 0, Y2 = 1) = Pr(Y1 = 1, Y2 = 0) = 1 6 ,Pr(Y 1 = 1, Y2 = 1) = 1 3 . 23 ThenY 1, Y2 ∼Ber(1/2). DefineXdeterministically by X=    1,(Y 1, Y2) = (0,0), 2,(Y 1, Y2...

work page