A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
High Fidelity Visualization of What Your Self-Supervised Representation Knows About
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
Representation-conditioned diffusion models generate synthetic ImageNet data that trains classifiers to higher top-1 accuracy than class-conditioned generation (+10.76 pp) or real data (+2.0 pp when scaled).
Self-supervised representation conditioning on diffusion models boosts unconditional generation quality and yields a controllable space with smooth, disentangled variations.
citing papers explorer
-
Hierarchical Text-Conditional Image Generation with CLIP Latents
A hierarchical prior-decoder model using CLIP latents generates more diverse text-conditional images than direct methods while preserving photorealism and caption fidelity.
-
Representation-Conditioned Diffusion Models for Guided Training Data Generation
Representation-conditioned diffusion models generate synthetic ImageNet data that trains classifiers to higher top-1 accuracy than class-conditioned generation (+10.76 pp) or real data (+2.0 pp when scaled).
-
Towards Controllable Image Generation through Representation-Conditioned Diffusion Models
Self-supervised representation conditioning on diffusion models boosts unconditional generation quality and yields a controllable space with smooth, disentangled variations.