Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
Learning Transferable Visual Models From Natural Language Supervision
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2022 1verdicts
ACCEPT 1representative citing papers
citing papers explorer
-
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.