arxiv: physics/0004057 · v1 · submitted 2000-04-24 · ⚛️ physics.data-an · cond-mat.dis-nn· cs.LG· nlin.AO

Recognition: 3 theorem links

· Lean Theorem

The information bottleneck method

Fernando C. Pereira (ATT Shannon Laboratory), Naftali Tishby (Hebrew University, NEC Research Institute), William Bialek (NEC Research Institute)

Pith reviewed 2026-05-11 11:12 UTC · model grok-4.3

classification ⚛️ physics.data-an cond-mat.dis-nncs.LGnlin.AO

keywords information bottleneckmutual informationrate distortion theorydata compressionrelevant informationfeature extractionsignal processinglearning theory

0 comments

The pith

Compressing a signal X through limited codewords can preserve all the information it provides about another signal Y.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to formalize the extraction of relevant information from one signal about another as an optimization task. Relevant information means the part of X that helps predict Y, such as speech sounds helping identify spoken words. The authors propose squeezing X into a compressed representation T so that T carries as much information about Y as possible while using as little information from X as possible. This matters because it supplies a concrete mathematical procedure to decide which features of a signal are worth keeping for a given prediction task. The approach treats the tradeoff as a generalization of rate-distortion ideas in which the cost of error arises automatically from the observed relationship between X and Y.

Core claim

We define the relevant information in a signal x as the information it provides about y. We formalize the task of finding a short code for x that preserves the maximum information about y as squeezing that information through a bottleneck formed by a limited set of codewords t. This constrained optimization can be seen as a generalization of rate distortion theory in which the distortion measure emerges from the joint statistics of x and y. The variational principle yields an exact set of self-consistent equations for the coding rules from x to t and from t to y, which can be solved by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm.

What carries the argument

The bottleneck variable T, the compressed representation of X that is found by optimizing the tradeoff between the information lost in compression and the information retained about Y.

If this is right

The optimal coding rules X to T and T to Y are given by the fixed points of the self-consistent equations.
These equations are solved by an iterative re-estimation algorithm that converges to the solution.
The effective distortion measure in the equivalent rate-distortion problem is determined directly by the joint statistics p(x,y).
The same variational principle supplies a framework for analyzing problems in signal processing and learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

When the joint distribution must be estimated from finite samples, the method may need additional regularization to remain stable.
Choosing different target signals Y could turn the same optimization into a tool for supervised or semi-supervised feature extraction.
The framework suggests that clustering or dimensionality reduction can be performed by treating class labels or future observations as the Y variable.

Load-bearing premise

The joint distribution p(x,y) is known or can be estimated reliably from data so that the mutual information quantities can be computed exactly.

What would settle it

Running the re-estimation procedure on a dataset whose joint distribution p(x,y) is known exactly and finding that the resulting coding rules fail to satisfy the self-consistent equations or achieve the predicted levels of information preservation about Y.

read the original abstract

We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is the original paper that introduced the information bottleneck as a variational generalization of rate-distortion theory, and the derivation is clean and direct.

read the letter

The main thing here is that this paper sets out the information bottleneck for the first time. It frames the task as finding a compressed code T for X that keeps as much mutual information with Y as possible, and it turns that into a Lagrangian whose stationary points give explicit self-consistent equations for the mappings p(t|x) and p(y|t). The iteration they describe is a straightforward extension of the Blahut-Arimoto procedure and they show it decreases the objective monotonically for finite alphabets. That part is new relative to the rate-distortion work they cite and follows immediately from the usual definitions of mutual information and the Markov chain X-T-Y. The math is transparent and does not hide any circular steps or extra assumptions in the derivation itself. They also note that the effective distortion measure arises naturally from the joint statistics rather than being imposed by hand, which is a useful conceptual move. The practical limitation is that the whole construction assumes p(x,y) is known or can be estimated reliably. The paper treats this as given and does not discuss sampling, high-dimensional estimation, or what happens when the joint is only approximate. That is a genuine gap for anyone who wants to apply the method to real data, though it does not undermine the theoretical contribution. This is the sort of paper that is useful to people working on information-theoretic approaches in machine learning, signal processing, or theoretical neuroscience. A reader who wants a principled way to combine compression and relevance will find the core idea worth their time. The central argument is solid enough that it should go to peer review rather than being desk-rejected.

Referee Report

0 major / 3 minor

Summary. The paper defines the relevant information in a signal X about another signal Y as the information preserved through a compressed bottleneck representation T. It formalizes this as a constrained optimization problem maximizing I(T;Y) subject to a bound on I(X;T), shows that this is a generalization of rate-distortion theory in which the distortion measure emerges from the joint p(x,y), derives the exact self-consistent equations for the optimal mappings p(t|x) and p(y|t), and presents a convergent iterative re-estimation algorithm that generalizes the Blahut-Arimoto procedure.

Significance. If the central derivation holds, the work supplies a principled, parameter-light variational framework for relevance-preserving compression with direct applicability to signal processing and learning tasks. Its strengths include the clean derivation of the fixed-point equations from standard mutual-information identities and the Markov chain X–T–Y, the explicit generalization of rate-distortion theory, and the guarantee of monotonic improvement and convergence for finite alphabets.

minor comments (3)

The abstract states that applications 'will be described in detail elsewhere'; a brief forward reference or one-sentence outline of the intended follow-up would improve self-contained readability.
Notation for the bottleneck variable alternates between T and tX in the abstract; consistent use of a single symbol (e.g., T) throughout the manuscript would reduce minor confusion.
The weakest assumption—that p(x,y) is known or reliably estimated—is stated clearly but could be highlighted with a short remark on practical estimation procedures in the main text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our manuscript, the recognition of its strengths, and the recommendation to accept. The referee's description accurately captures the central contributions of the work.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained from mutual information definitions and variational calculus

full rationale

The paper's central derivation starts from the definitions of mutual information I(X;T) and I(T;Y) under the Markov chain X–T–Y, formulates the bottleneck as a constrained optimization problem, introduces a Lagrange multiplier for the I(X;T) term, and obtains the fixed-point equations via functional derivatives. These steps rely only on standard information-theoretic identities and calculus of variations; no parameters are fitted and then relabeled as predictions, no self-citations carry load-bearing uniqueness claims, and the generalization of rate-distortion theory is presented as an interpretive analogy rather than a renaming that substitutes for derivation. The iterative re-estimation procedure is shown to be a valid alternating optimization that monotonically decreases the functional, but this is a consequence of the variational setup rather than a circular reduction. The joint p(x,y) is an external input, matching the stated weakest assumption.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The framework rests on standard information-theoretic definitions plus one free trade-off parameter and the introduction of the bottleneck variable T.

free parameters (1)

beta
Lagrange multiplier that trades off compression rate against preserved mutual information with Y; its value is chosen according to the desired operating point.

axioms (2)

standard math Mutual information I(X;Y) = H(X) - H(X|Y) is the measure of relevance.
Used to quantify both the compression cost and the preserved relevance.
domain assumption The mapping from X to T is a stochastic kernel p(t|x) that can be optimized independently of the downstream mapping from T to Y.
Enables the Markov chain structure X–T–Y required for the bottleneck.

invented entities (1)

bottleneck variable T no independent evidence
purpose: Compressed representation of X that retains maximal information about Y
New random variable introduced to enforce the rate constraint; no external falsifiable prediction is supplied for T itself.

pith-pipeline@v0.9.0 · 5565 in / 1495 out tokens · 58142 ms · 2026-05-11T11:12:37.629353+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.FunctionalEquation washburn_uniqueness_aczel echoes
This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x, x̃) emerges from the joint statistics of X and Y. This approach yields an exact set of self consistent equations for the coding rules X → X̃ and X̃ → Y.
IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced echoes
Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning
IndisputableMonolith.Foundation.LawOfExistence defect_zero_iff_one echoes
the information that this signal provides about another signal y∈Y

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Gradient-Based Program Synthesis with Neurally Interpreted Languages
cs.LG 2026-04 unverdicted novelty 8.0

NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
The Query Channel: Information-Theoretic Limits of Masking-Based Explanations
cs.AI 2026-04 unverdicted novelty 8.0

Masking-based explanations are governed by the information capacity of the query channel, with reliable recovery achievable below capacity via sparse maximum-likelihood decoding but impossible above it.
Decoupled and Divergence-Conditioned Prompt for Multi-domain Dynamic Graph Foundation Models
cs.LG 2026-05 conditional novelty 7.0

DyGFM introduces decoupled pre-training and divergence-conditioned prompts to create the first multi-domain dynamic graph foundation model that outperforms baselines on node classification and link prediction.
On the Generalization of Knowledge Distillation: An Information-Theoretic View
cs.IT 2026-05 unverdicted novelty 7.0

Knowledge distillation generalization bounds are derived via a new distillation divergence measuring teacher-student kernel difference, with tighter bounds from teacher loss flatness.
JEDI: Joint Embedding Diffusion World Model for Online Model-Based Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

JEDI is the first online end-to-end latent diffusion world model that trains latents from denoising loss rather than reconstruction, achieving competitive Atari100k results with 43% less VRAM and over 3x faster sampli...
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
cs.AI 2026-05 unverdicted novelty 7.0

Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
Neural Information Causality
quant-ph 2026-05 unverdicted novelty 7.0

Neural-IC separates embedding inequalities from capacity bounds in query-separated computations, with one-bit RAC benchmarks and CHSH-layer stability selecting the Tsirelson threshold for quantum enhancements.
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection
cs.CV 2026-05 unverdicted novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
cs.LG 2026-05 unverdicted novelty 7.0

CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
Task Relevance Is Not Local Replaceability: A Two-Axis View of Channel Information
cs.CV 2026-05 unverdicted novelty 7.0

Channel importance splits into task relevance and local replaceability; local-axis metrics predict safe removal under pruning better than target-axis metrics across multiple CNNs and datasets.
Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection
cs.LG 2026-05 unverdicted novelty 7.0

MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and sup...
Latent State Design for World Models under Sufficiency Constraints
cs.AI 2026-05 unverdicted novelty 7.0

World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.
Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective
cs.LG 2026-04 unverdicted novelty 7.0

KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior...
Modeling Higher-Order Brain Interactions via a Multi-View Information Bottleneck Framework for fMRI-based Psychiatric Diagnosis
cs.LG 2026-04 unverdicted novelty 7.0

A tri-view information-bottleneck model that fuses pairwise, triadic and tetradic O-information outperforms eleven baselines on four fMRI psychiatric datasets while revealing region-level synergy-redundancy patterns.
Visual Late Chunking: An Empirical Study of Contextual Chunking for Efficient Visual Document Retrieval
cs.CV 2026-04 unverdicted novelty 7.0

ColChunk adaptively chunks visual document patches into contextual multi-vectors via clustering, cutting storage by over 90% while raising average nDCG@5 by 9 points.
Dream to Control: Learning Behaviors by Latent Imagination
cs.LG 2019-12 accept novelty 7.0

Dreamer learns to control from images by imagining and optimizing behaviors in a learned latent world model, outperforming prior methods on 20 visual tasks in data efficiency and final performance.
The Diffusion Encoder
cs.LG 2026-05 unverdicted novelty 6.0

A diffusion model serves as the encoder in an autoencoder when trained alternately with the decoder to resolve opposing update directions while retaining the standard diffusion training objective.
MLGIB: Multi-Label Graph Information Bottleneck for Expressive and Robust Message Passing
cs.LG 2026-05 unverdicted novelty 6.0

MLGIB formulates multi-label graph message passing as constrained information transmission using variational bounds that maximize mutual information with target labels while limiting redundant source information.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
cs.AI 2026-05 unverdicted novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift
cs.LG 2026-05 unverdicted novelty 6.0

DeconDTN-Toolkit simulates provenance shifts to expose ERM vulnerabilities and provides tools plus a robust OOD indicator for mitigating confounding by data provenance.
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
cs.LG 2026-05 unverdicted novelty 6.0

HEPA combines self-supervised JEPA pretraining on time series representations with horizon-conditioned finetuning to predict rare events via survival CDFs, outperforming PatchTST, iTransformer, MAE, and Chronos-2 on a...
HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series
cs.LG 2026-05 unverdicted novelty 6.0

HEPA combines JEPA self-supervised pretraining with horizon-conditioned fine-tuning to predict rare events in multivariate time series as a monotonic survival distribution, outperforming PatchTST, iTransformer, MAE, a...
EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs
cs.CV 2026-05 unverdicted novelty 6.0

EchoPrune prunes video tokens via query relevance and temporal reconstruction error to let VideoLLMs handle up to 20x more frames under fixed budget with reported gains in accuracy and speed.
Let the Target Select for Itself: Data Selection via Target-Aligned Paths
cs.LG 2026-05 unverdicted novelty 6.0

Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.
LBI: Parallel Scan Backpropagation via Latent Bounded Interfaces
cs.LG 2026-05 unverdicted novelty 6.0

LBI enables tractable parallel backpropagation by reducing inter-region adjoint computation to low-dimensional r x r Jacobians while preserving exact gradients under a bounded-interface model.
Information as Maximum-Caliber Deviation: A bridge between Integrated Information Theory and the Free Energy Principle
q-bio.NC 2026-05 unverdicted novelty 6.0

Information defined as maximum-caliber deviation derives IIT 3.0 cause-effect repertoires from constrained entropy maximization and equates to prediction error under CLT and LDT.
The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning
cs.CL 2026-05 unverdicted novelty 6.0

Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact ...
When Less is Enough: Efficient Inference via Collaborative Reasoning
cs.LG 2026-05 conditional novelty 6.0

A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.
How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework
cs.CL 2026-04 unverdicted novelty 6.0

LLM OOD detectors are length-confounded; a two-pathway embedding-plus-trajectory framework detects covert OOD inputs at 0.721 average AUROC and 0.850 on jailbreaks.
Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models
cs.CV 2026-04 unverdicted novelty 6.0

Three frameworks adapt foundation models for generalized category discovery under domain shifts via disentanglement and prompt tuning, showing gains on synthetic and real multi-domain data.
Subgraph Concept Networks: Concept Levels in Graph Classification
cs.LG 2026-04 unverdicted novelty 6.0

Subgraph Concept Network is a new GNN architecture that distills meaningful concepts at node, subgraph, and graph levels via soft clustering to improve explainability while maintaining competitive accuracy.
LLM Safety From Within: Detecting Harmful Content with Internal Representations
cs.AI 2026-04 unverdicted novelty 6.0

SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
cs.CV 2026-04 unverdicted novelty 6.0

OneVL is the first latent CoT method to exceed explicit CoT accuracy on four driving benchmarks while running at answer-only speed, by supervising latent tokens with a visual world model decoder.
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
cs.CV 2026-04 unverdicted novelty 6.0

OneVL achieves superior accuracy to explicit chain-of-thought reasoning at answer-only latency by supervising latent tokens with a visual world model decoder that predicts future frames.
CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference
cs.SE 2026-04 unverdicted novelty 6.0

CoDe-R refines LLM decompiler output via rationale-guided semantic injection and dynamic fallback inference, making a 1.3B model the first to exceed 50% average re-executability on HumanEval-Decompile.
Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
cs.LG 2026-04 unverdicted novelty 6.0

An information-theoretic optimization framework for task-adapted CS-MRI enables adaptive sampling at arbitrary ratios and probabilistic inference for uncertainty while supporting joint reconstruction-task or privacy-f...
MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models
cs.CV 2026-04 unverdicted novelty 6.0

MODIX dynamically rescales positional indices in VLMs using intra-modal covariance-based entropy and inter-modal alignment scores to allocate finer granularity to informative content.
Bridging What the Model Thinks and How It Speaks: Self-Aware Speech Language Models for Expressive Speech Generation
cs.CL 2026-04 unverdicted novelty 6.0

SA-SLM uses variational information bottleneck for intent-aware bridging and self-criticism for realization-aware alignment to close the semantic-acoustic gap, outperforming open-source models and nearing GPT-4o-Audio...
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
cs.CL 2026-04 conditional novelty 6.0

Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-paramete...
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control
cs.LG 2026-04 unverdicted novelty 6.0

GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on cont...
Variational Feature Compression for Model-Specific Representations
cs.CV 2026-04 unverdicted novelty 6.0

A variational latent bottleneck with KL regularization and a dynamic binary mask based on saliency produces model-specific features that keep high accuracy for one classifier but drop others below 2% on CIFAR-100 with...
PDMP: Rethinking Balanced Multimodal Learning via Performance-Dominant Modality Prioritization
cs.CV 2026-04 unverdicted novelty 6.0

Imbalanced multimodal learning that prioritizes the performance-dominant modality via unimodal ranking and asymmetric gradient modulation outperforms balanced approaches.
Super Agents and Confounders: Influence of surrounding agents on vehicle trajectory prediction
cs.LG 2026-04 unverdicted novelty 6.0

Surrounding agents frequently degrade trajectory prediction accuracy in interactive driving scenes, and integrating a Conditional Information Bottleneck improves results by ignoring non-beneficial contextual signals.
Back to Basics: Let Denoising Generative Models Denoise
cs.CV 2025-11 unverdicted novelty 6.0

Directly predicting clean data with large-patch pixel Transformers enables strong generative performance in diffusion models where noise prediction fails at high dimensions.
Task-Aware Answer Preservation under Audio Compression for Large Audio Language Models
eess.AS 2026-05 unverdicted novelty 5.0

A statistical sign-off protocol for audio compressors ensures worst-case answer preservation across query families in LALMs.
Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective
cs.HC 2026-05 unverdicted novelty 5.0

A minimal three-variable dynamical model of human-AI feedback predicts that increasing reliance on AI induces a transition to a low-diversity suboptimal equilibrium, interpreted as an emergent information bottleneck.
Distributed Deep Variational Approach for Privacy-preserving Data Release
cs.CR 2026-05 unverdicted novelty 5.0

GPP trains local variational encoders in federated settings to release representations that keep utility within 1% of an autoencoder baseline while driving adversary AUC on sensitive attributes to near-random levels o...
Learning Fingerprints for Medical Time Series with Redundancy-Constrained Information Maximization
cs.LG 2026-04 unverdicted novelty 5.0

A self-supervised method learns a fixed set of disentangled fingerprint tokens from medical time series by combining reconstruction loss with a total coding rate diversity penalty, framed as a disentangled rate-distor...
Vib2Conf: AI-driven discrimination of molecular conformations from vibrational spectra
physics.chem-ph 2026-04 unverdicted novelty 5.0

Vib2Conf achieves over 95% top-1 recall on standard spectrum-to-structure benchmarks and 82% recall for distinguishing near-isomeric 3D conformers differing by only ~1 Å RMSD.
Sema: Semantic Transport for Real-Time Multimodal Agents
cs.MM 2026-04 unverdicted novelty 5.0

Sema reduces uplink bandwidth by 64x for audio and 130-210x for screenshots while keeping multimodal agent task accuracy within 0.7 percentage points of raw baselines in WAN simulations.
Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
cs.LG 2026-04 unverdicted novelty 5.0

Absorber LLM introduces causal synchronization to absorb context into parameters for memory-efficient long-context LLM inference while preserving causal effects.
Sensitivity Uncertainty Alignment in Large Language Models
cs.CR 2026-04 unverdicted novelty 5.0

SUA measures the gap between how much an LLM's output changes under perturbations and how uncertain the model claims to be, with a training procedure to reduce that gap.
Community Detection with the Canonical Ensemble
cs.SI 2026-04 unverdicted novelty 5.0

Community detection is treated as hypothesis testing with test statistics and canonical-ensemble null models that maximize entropy under chosen constraints.
PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment
cs.CV 2026-04 unverdicted novelty 5.0

PortraitDirector uses hierarchical disentanglement of spatial physical motions and semantic emotions to deliver controllable, high-fidelity real-time facial reenactment at 20 FPS.
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
cs.LG 2026-04 unverdicted novelty 5.0

CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...
Retrieval-Augmented Multimodal Model for Fake News Detection
cs.CL 2026-04 unverdicted novelty 5.0

RAMM improves multimodal fake news detection by retrieving abstract narrative consistencies across instances and shifting to analogical reasoning via an MLLM backbone and two alignment modules.
In Search of Lost DNA Sequence Pretraining
cs.LG 2026-04 unverdicted novelty 5.0

DNA pretraining suffers from inappropriate evaluation datasets, flawed neighbor-masking, and neglected vocabulary design; the authors supply guidelines and a reproducible testbed to fix them.
The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents
cs.AI 2026-05 unverdicted novelty 4.0

Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.
Position: Life-Logging Video Streams Make the Privacy-Utility Trade-off Inevitable
cs.CV 2026-05 unverdicted novelty 4.0

Life-logging video streams create an inevitable privacy-utility trade-off that is a foundational challenge for always-on AI systems.
Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition
cs.MM 2026-05 unverdicted novelty 4.0

MCUR improves multimodal emotion recognition across heterogeneous modality setups by combining modality-combination contrastive learning with sample-wise uncertainty regularization, yielding F1 gains of 2.2-4.37% on M...

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 60 Pith papers

[1]

Extracting relevant informati on,

W. Bialek and N. Tishby, “Extracting relevant informati on,” in prepara- tion. 15

work page
[2]

T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley, New York, 1991)

work page 1991
[3]

Information geometry and alternating mini- mization procedures,

I. Csisz´ ar and G. Tusn´ ady, “Information geometry and alternating mini- mization procedures,” Statistics and Decisions Suppl. 1, 205–237 (1984)

work page 1984
[4]

Computation of channel capacity and rate d istortion func- tion,

R. E. Blahut, “Computation of channel capacity and rate d istortion func- tion,” IEEE Trans. Inform. Theory IT-18, 460–473 (1972)

work page 1972
[5]

Agglomerative information bot tleneck,

N. Slonim and N. Tishby, “Agglomerative information bot tleneck,” To appear in Advances in Neural Information Processing systems (NIPS-1 2) 1999

work page 1999
[6]

Distributional clu stering of En- glish words,

F. C. Pereira, N. Tishby, and L. Lee, “Distributional clu stering of En- glish words,” in 30th Annual Mtg. of the Association for Computational Linguistics, pp. 183–190 (1993). 16

work page 1993