Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

· 2017 · cs.CV · arXiv 1711.07068

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

This paper explores image caption generation using conditional variational auto-encoders (CVAEs). Standard CVAEs with a fixed Gaussian prior yield descriptions with too little variability. Instead, we propose two models that explicitly structure the latent space around $K$ components corresponding to different types of image content, and combine components to create priors for images that contain multiple types of content simultaneously (e.g., several kinds of objects). Our first model uses a Gaussian Mixture model (GMM) prior, while the second one defines a novel Additive Gaussian (AG) prior that linearly combines component means. We show that both models produce captions that are more diverse and more accurate than a strong LSTM baseline or a "vanilla" CVAE with a fixed Gaussian prior, with AG-CVAE showing particular promise.

representative citing papers

Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling

q-bio.BM · 2025-09-02 · unverdicted · novelty 6.0

PepMorph generates morphology-targeted peptides via a Transformer conditional VAE and reports 83% success under CG-MD validation.

citing papers explorer

Showing 1 of 1 citing paper.

Morphology-Aware Peptide Discovery via Masked Conditional Generative Modeling q-bio.BM · 2025-09-02 · unverdicted · none · ref 33 · internal anchor
PepMorph generates morphology-targeted peptides via a Transformer conditional VAE and reports 83% success under CG-MD validation.

Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

fields

years

verdicts

representative citing papers

citing papers explorer