Diffusionclip: Text-guided diffusion models for robust image manipulation

Gihyun Kim, Taehoon Kwon, Jong Chul Ye · 2022 · arXiv 2110.02711

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Voxify3D: Pixel Art Meets Volumetric Rendering

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

cs.CV · 2022-05-23 · accept · novelty 7.0

Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.

Controlla: Learning Controllability via Graph-Constrained Latent Geometry

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport to enforce consistent attribute trajectories while preserving reference identity.

citing papers explorer

Showing 3 of 3 citing papers.

Voxify3D: Pixel Art Meets Volumetric Rendering cs.CV · 2025-12-08 · unverdicted · none · ref 42
Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding cs.CV · 2022-05-23 · accept · none · ref 34
Imagen achieves state-of-the-art photorealistic text-to-image generation by scaling a text-only pretrained T5 language model within a diffusion framework, reaching FID 7.27 on COCO without training on it.
Controlla: Learning Controllability via Graph-Constrained Latent Geometry cs.CV · 2026-05-15 · unverdicted · none · ref 22
Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport to enforce consistent attribute trajectories while preserving reference identity.

Diffusionclip: Text-guided diffusion models for robust image manipulation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer