citation dossier

ediff-i: Text-to-image diffusion models with ensemble of expert denoisers

Balaji, Y · 2022 · arXiv 2211.01324

18Pith papers citing it

18reference links

cs.CVtop field · 15 papers

UNVERDICTEDtop verdict bucket · 14 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 18 reviewed papers. Its strongest current cluster is cs.CV (15 papers). The largest review-status bucket among citing papers is UNVERDICTED (14 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Consistency Models

cs.LG · 2023-03-02 · conditional · novelty 8.0

Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.

A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.

Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings

cs.CV · 2026-04-28 · conditional · novelty 7.0

Introduces closed-set C-Bench and open-set O-Bench for layout-guided diffusion models, a unified semantic-spatial scoring protocol, and ranks six models after generating and evaluating 319,086 images.

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

cs.CV · 2024-03-08 · unverdicted · novelty 7.0

ELLA introduces a timestep-aware semantic connector to link LLMs with diffusion models for improved dense prompt following, validated on a new 1K-prompt benchmark.

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

cs.CV · 2023-07-10 · unverdicted · novelty 7.0

A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.

Leveraging Verifier-Based Reinforcement Learning in Image Editing

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

Edit-R1 trains a CoT-based reasoning reward model with GCPO and uses it to boost image editing performance over VLMs and models like FLUX.1-kontext via GRPO.

Temporally Extended Mixture-of-Experts Models

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.

OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner

cs.CV · 2026-04-14 · conditional · novelty 6.0

OFA-Diffusion Compression trains diffusion models once to yield multiple size-specific compressed subnetworks via restricted candidate spaces, importance-based channel allocation, and reweighting.

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

cs.CV · 2024-04-02 · unverdicted · novelty 6.0

CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

cs.CV · 2023-08-13 · unverdicted · novelty 6.0

IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

cs.CV · 2023-07-04 · conditional · novelty 6.0

SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.

Embody4D: A Generalist 4D World Model for Embodied AI

cs.CV · 2026-05-03 · unverdicted · novelty 5.0

Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.

DiffMagicFace: Identity Consistent Facial Editing of Real Videos

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.

ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

ADP-DiT is a text-conditioned diffusion transformer for synthesizing longitudinal Alzheimer's MRI scans, reporting SSIM 0.8739 and PSNR 29.32 dB with improvements over a DiT baseline.

3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.

Movie Gen: A Cast of Media Foundation Models

cs.CV · 2024-10-17 · unverdicted · novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.

Cosmos World Foundation Model Platform for Physical AI

cs.CV · 2025-01-07 · unverdicted · novelty 3.0

The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

citing papers explorer

Showing 18 of 18 citing papers.

Consistency Models cs.LG · 2023-03-02 · conditional · none · ref 1
Consistency models achieve fast one-step generation with SOTA FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 by directly mapping noise to data, outperforming prior distillation techniques.
A Flow Matching Algorithm for Many-Shot Adaptation to Unseen Distributions cs.LG · 2026-05-07 · unverdicted · none · ref 22
FP-FM adapts flow matching models to unseen distributions via least-squares projection onto basis functions spanning training velocity fields, yielding improved precision and recall without inference-time training.
Benchmarking Layout-Guided Diffusion Models through Unified Semantic-Spatial Evaluation in Closed and Open Settings cs.CV · 2026-04-28 · conditional · none · ref 2
Introduces closed-set C-Bench and open-set O-Bench for layout-guided diffusion models, a unified semantic-spatial scoring protocol, and ranks six models after generating and evaluating 319,086 images.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment cs.CV · 2024-03-08 · unverdicted · none · ref 5
ELLA introduces a timestep-aware semantic connector to link LLMs with diffusion models for improved dense prompt following, validated on a new 1K-prompt benchmark.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning cs.CV · 2023-07-10 · unverdicted · none · ref 1
A single motion module trained on videos adds temporally coherent animation to any personalized text-to-image model derived from the same base without additional tuning.
Leveraging Verifier-Based Reinforcement Learning in Image Editing cs.CV · 2026-04-30 · unverdicted · none · ref 2
Edit-R1 trains a CoT-based reasoning reward model with GCPO and uses it to boost image editing performance over VLMs and models like FLUX.1-kontext via GRPO.
Temporally Extended Mixture-of-Experts Models cs.LG · 2026-04-22 · unverdicted · none · ref 2
Temporally extended MoE layers using the option-critic framework with deliberation costs cut switching rates below 5% while retaining most capability on MATH, MMLU, and MMMLU.
OFA-Diffusion Compression: Compressing Diffusion Model in One-Shot Manner cs.CV · 2026-04-14 · conditional · none · ref 1
OFA-Diffusion Compression trains diffusion models once to yield multiple size-specific compressed subnetworks via restricted candidate spaces, importance-based channel allocation, and reweighting.
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer cs.CV · 2026-04-07 · unverdicted · none · ref 4
PoM is a new linear-complexity token mixer using learned polynomials that matches attention performance in transformers while enabling efficient long-sequence processing.
CameraCtrl: Enabling Camera Control for Text-to-Video Generation cs.CV · 2024-04-02 · unverdicted · none · ref 7
CameraCtrl enables accurate camera pose control in video diffusion models through a trained plug-and-play module and dataset choices emphasizing diverse camera trajectories with matching appearance.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 5
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis cs.CV · 2023-07-04 · conditional · none · ref 1
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
Embody4D: A Generalist 4D World Model for Embodied AI cs.CV · 2026-05-03 · unverdicted · none · ref 5
Embody4D generates high-fidelity, view-consistent novel views from monocular videos for embodied scenarios via 3D-aware data synthesis, adaptive noise injection, and interaction-aware attention.
DiffMagicFace: Identity Consistent Facial Editing of Real Videos cs.CV · 2026-04-15 · unverdicted · none · ref 46
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.
ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression cs.CV · 2026-04-15 · unverdicted · none · ref 15
ADP-DiT is a text-conditioned diffusion transformer for synthesizing longitudinal Alzheimer's MRI scans, reporting SSIM 0.8739 and PSNR 29.32 dB with improvements over a DiT baseline.
3D Smoke Scene Reconstruction Guided by Vision Priors from Multimodal Large Language Models cs.CV · 2026-04-07 · unverdicted · none · ref 1
A framework that combines MLLM-based image enhancement with a medium-aware 3D Gaussian Splatting model to reconstruct and render smoke scenes.
Movie Gen: A Cast of Media Foundation Models cs.CV · 2024-10-17 · unverdicted · none · ref 2
A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 8
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

ediff-i: Text-to-image diffusion models with ensemble of expert denoisers

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer