pith. sign in

hub

Importance Weighted Autoencoders

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it
abstract

The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It typically makes strong assumptions about posterior inference, for instance that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network's entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks.

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

representative citing papers

Density estimation using Real NVP

cs.LG · 2016-05-27 · accept · novelty 8.0

Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

Learning to Theorize the World from Observation

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

Revisiting the Volume Hypothesis

cs.LG · 2026-06-30 · unverdicted · novelty 6.0

The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

citing papers explorer

Showing 11 of 11 citing papers.

  • Density estimation using Real NVP cs.LG · 2016-05-27 · accept · none · ref 10

    Real NVP uses affine coupling layers to create invertible transformations that support exact density estimation, sampling, and latent inference without approximations.

  • MirrorCheck: Efficient Adversarial Defense for Vision-Language Models cs.CV · 2024-06-13 · unverdicted · none · ref 10 · internal anchor

    MirrorCheck detects adversarial attacks on VLMs via T2I regeneration for semantic consistency checks, using stochastic model selection and one-time perturbations for robustness against adaptive attacks.

  • End-to-End Identifiable and Consistent Recurrent Switching Dynamical Systems stat.ML · 2026-05-07 · unverdicted · none · ref 8

    Identifiability is proven for recurrent nonlinear switching dynamical systems under flexible assumptions, and ΩSDS is introduced as a flow-based estimator that improves disentanglement and forecasting over VAE-based methods.

  • Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 205

    NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.

  • Revisiting the Volume Hypothesis cs.LG · 2026-06-30 · unverdicted · none · ref 122 · internal anchor

    The generalization advantage of SGD over random sampling diminishes with growing training set size in binary networks, as measured by joint density of states over train and test accuracy.

  • Efficient Learning of Deep State Space Models via Importance Smoothing cs.LG · 2026-05-20 · unverdicted · none · ref 3 · internal anchor

    PVMC is a new parallel training algorithm for deep state space models that achieves 10x faster training than prior SMC methods while matching or exceeding benchmark performance for both generative and discriminative tasks.

  • Continuous Diffusion Scales Competitively with Discrete Diffusion for Language cs.CL · 2026-05-18 · conditional · none · ref 5 · internal anchor

    RePlaid achieves a 20x compute gap to autoregressive models, new SOTA PPL of 22.1 among continuous DLMs on OpenWebText, and competitive scaling laws by aligning architecture with modern discrete DLMs.

  • Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning cs.RO · 2026-02-09 · unverdicted · none · ref 95 · internal anchor

    R&B-EnCoRe uses self-supervised importance-weighted variational inference to distill action-predictive reasoning datasets that improve VLA performance on manipulation, navigation, and driving tasks without external verifiers.

  • A renormalization-group inspired lattice-based framework for piecewise generalized linear models stat.ME · 2026-05-06 · unverdicted · none · ref 72

    RG-inspired lattice models for piecewise GLMs provide explicit interpretable partitions and a replica-analysis-derived scaling law for regularization that allows increasing complexity without expected rise in generalization loss.

  • QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 21

    QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

  • Mitigating Barren Plateaus in Quantum Denoising Diffusion Probabilistic Model cs.LG · 2025-12-07 · unverdicted · none · ref 5 · internal anchor

    Quantum diffusion models develop a distinct barren plateau beyond small qubit counts; an architectural enhancement and conditional formulation restore trainability for Hamiltonian-parameterized ground-state generation.