Diffusion model alignment using direct preference optimization

Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik · 2024

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.

Step-level Denoising-time Diffusion Alignment with Multiple Objectives

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

MSDDA derives a closed-form optimal reverse denoising distribution for multi-objective diffusion alignment that is exactly equivalent to step-level RL fine-tuning with no approximation error.

Leveraging Verifier-Based Reinforcement Learning in Image Editing

cs.CV · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

Edit-R1 builds a CoT-based reasoning reward model (RRM) via SFT and GCPO, then applies it with GRPO to improve image editing models such as FLUX.1-kontext.

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

cs.CV · 2025-10-02 · conditional · novelty 6.0

Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.

DanceGRPO: Unleashing GRPO on Visual Generation

cs.CV · 2025-05-12 · unverdicted · novelty 6.0

DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

cs.CV · 2025-03-10 · unverdicted · novelty 6.0

Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.

citing papers explorer

Showing 7 of 7 citing papers.

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation cs.CV · 2026-05-05 · unverdicted · none · ref 31
Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories cs.CV · 2026-04-16 · unverdicted · none · ref 46
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
Step-level Denoising-time Diffusion Alignment with Multiple Objectives cs.LG · 2026-04-15 · unverdicted · none · ref 32
MSDDA derives a closed-form optimal reverse denoising distribution for multi-objective diffusion alignment that is exactly equivalent to step-level RL fine-tuning with no approximation error.
Leveraging Verifier-Based Reinforcement Learning in Image Editing cs.CV · 2026-04-30 · unverdicted · none · ref 53 · 2 links
Edit-R1 builds a CoT-based reasoning reward model (RRM) via SFT and GCPO, then applies it with GRPO to improve image editing models such as FLUX.1-kontext.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation cs.CV · 2025-10-02 · conditional · none · ref 56
Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.
DanceGRPO: Unleashing GRPO on Visual Generation cs.CV · 2025-05-12 · unverdicted · none · ref 12
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model cs.CV · 2025-03-10 · unverdicted · none · ref 33
Seedream 2.0 is a native Chinese-English bilingual diffusion model that integrates a self-developed LLM text encoder, Glyph-Aligned ByT5, and Scaled ROPE to reach claimed state-of-the-art results in prompt following, aesthetics, text rendering, and human preference alignment via RLHF.

Diffusion model alignment using direct preference optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer