pith. machine review for the scientific record. sign in

arxiv: 1511.02793 · v2 · submitted 2015-11-09 · 💻 cs.LG · cs.CV

Recognition: unknown

Generating Images from Captions with Attention

Authors on Pith no claims yet
classification 💻 cs.LG cs.CV
keywords modelimagescaptionsgeneratesgenerativemodelsapproachesattending
0
0 comments X
read the original abstract

Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LAION-5B: An open large-scale dataset for training next generation image-text models

    cs.CV 2022-10 accept novelty 7.0

    LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.

  2. Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

    cs.CV 2026-05 unverdicted novelty 6.0

    Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to r...