On Variational Bounds of Mutual Information

Aaron van den Oord; Alexander A. Alemi; Ben Poole; George Tucker; Sherjil Ozair

arxiv: 1905.06922 · v1 · pith:4UOLLLVXnew · submitted 2019-05-16 · 💻 cs.LG · stat.ML

On Variational Bounds of Mutual Information

Ben Poole , Sherjil Ozair , Aaron van den Oord , Alexander A. Alemi , George Tucker This is my paper

classification 💻 cs.LG stat.ML

keywords boundsbiashighvariancevariationalinformationlearninglower

0 comments

read the original abstract

Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent developments in a single framework. We find that the existing variational lower bounds degrade when the MI is large, exhibiting either high bias or high variance. To address this problem, we introduce a continuum of lower bounds that encompasses previous bounds and flexibly trades off bias and variance. On high-dimensional, controlled problems, we empirically characterize the bias and variance of the bounds and their gradients and demonstrate the effectiveness of our new bounds for estimation and representation learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Harmony in Diversity: Multi-domain Contrastive Policy Optimization for Large Reasoning Models
cs.CL 2026-05 unverdicted novelty 7.0

MCPO applies contrastive learning to GRPO-style RL by treating cross-domain correct rollouts as positives and incorrect ones as negatives to improve multi-domain reasoning performance in LRMs.
Dream to Control: Learning Behaviors by Latent Imagination
cs.LG 2019-12 accept novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.
PoLAR: Factorizing Extent and Mode in Latent Actions for Robot Policy Learning
cs.RO 2026-06 unverdicted novelty 6.0

PoLAR imposes radial structure on latent actions in hyperbolic space to factorize extent and mode, improving robot policy performance over baselines.
Information theoretic underpinning of self-supervised learning by clustering
cs.LG 2026-05 unverdicted novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.