Understanding disentangling in β-VAE
read the original abstract
We present new intuitions and theoretical assessments of the emergence of disentangled representation in variational autoencoders. Taking a rate-distortion theory perspective, we show the circumstances under which representations aligned with the underlying generative factors of variation of data emerge when optimising the modified ELBO bound in $\beta$-VAE, as training progresses. From these insights, we propose a modification to the training regime of $\beta$-VAE, that progressively increases the information capacity of the latent code during training. This modification facilitates the robust learning of disentangled representations in $\beta$-VAE, without the previous trade-off in reconstruction accuracy.
This paper has not been read by Pith yet.
Forward citations
Cited by 20 Pith papers
-
Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
Inference-time refinement of pre-trained tabular diffusion models via Bidirectional Chamfer Refinement achieves median 8.6% better downstream performance than real data across 15 benchmarks while preserving fidelity a...
-
Gradient-Based Program Synthesis with Neurally Interpreted Languages
NLI autonomously discovers a vocabulary of primitive operations and interprets variable-length programs via a neural executor, allowing end-to-end training and gradient-based test-time adaptation that outperforms prio...
-
StrADiff: A Structured Source-Wise Adaptive Diffusion Framework for Linear and Nonlinear Blind Source Separation
StrADiff recovers latent source trajectories from linear and nonlinear mixtures via source-wise adaptive diffusion and a Gaussian process prior in a single unsupervised end-to-end objective.
-
Score-based Membership Inference on Diffusion Models
Presents SimA, a score-based single-query membership inference attack for diffusion models and LDMs that uses denoiser output norm to reveal training set proximity and outperforms multi-query baselines on eight datasets.
-
Posterior Collapse as Automatic Spectral Pruning
Posterior collapse in β-VAEs is derived as automatic spectral pruning via Landau stability analysis, with collapse thresholds matching normalized PCA spectra in the linear Gaussian case and tested on WorldClim data.
-
Winner-Take-All bottlenecks enforce disentangled symbolic representations in multi-task learning
WTA bottlenecks enforce highly symbolic, disentangled categorical representations of latent factors under defined conditions in multi-task DNNs, shown via theorem and experiments on two datasets.
-
Vision Foundation Models as Generalist Tokenizers for Image Generation
VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.
-
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling
A hybrid VAE with architectural annealing learns discrete clusters aligned with scanner and protocol differences in a dataset of 7416 structural connectomes spanning 13 studies.
-
A renormalization-group inspired lattice-based framework for piecewise generalized linear models
RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generali...
-
Discovering quantum phenomena with Interpretable Machine Learning
Variational autoencoders combined with symbolic regression extract physically meaningful representations and order parameters from raw quantum measurement data, revealing new phenomena such as corner-ordering in Rydbe...
-
Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing
RF-CMG synthesizes high-quality mmWave and RFID signals from WiFi using a diffusion model with Modality-Guided Embedding for high-frequency details and Low-Frequency Modality Consistency to preserve physical structure.
-
From Unsupervised to Guided Clustering: A Variational Implementation
GCVAE is a variational autoencoder that structures its latent space as a Gaussian mixture and optimizes a variational objective to make the representation maximally informative about a user-chosen guiding variable, en...
-
Cyclic Adaptive Private Synthesis for Sharing Real-World Data in Education
CAPS provides an iterative differentially private synthesis method that outperforms one-shot baselines on authentic educational real-world data.
-
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse feature circuits are introduced as interpretable causal subnetworks in language models, supporting unsupervised discovery of thousands of circuits and a method called SHIFT to improve classifier generalization ...
-
Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations
DAR replaces GAP with an attention-based aggregation module retrained jointly with the classifier head to disentangle core from spurious features and outperforms DFR on multiple datasets.
-
To Use AI as Dice of Possibilities with Timing Computation
Proposes verb-based paradigm with timing computation to enable data-driven discovery of patient trajectories and counterfactual timing from EHR data without domain knowledge.
-
Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds
Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser b...
-
A Systematic Framework for Tabular Data Disentanglement
A systematic framework modularizes tabular data disentanglement into data extraction, modeling, analysis, and latent extrapolation, with a case study on synthetic data generation.
-
Approximately Equivariant Recurrent Generative Models for Quasi-Periodic Time Series with a Progressive Training Scheme
AEQ-RVAE-ST combines approximate equivariance and progressive sequence lengthening in a recurrent VAE to match or exceed prior generative models on quasi-periodic time series benchmarks.
-
Learning Disentangled Representations for Generalized Multi-view Clustering
GMAE learns disentangled view-specific and view-common embeddings via dual-path autoencoders and cross-view adversarial training to boost performance on complete and incomplete multi-view clustering tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.