Large-scale reinforcement learning for diffusion models

Zhang, Y · arXiv 2503.01103

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs

cs.CV · 2026-05-10 · unverdicted · novelty 7.0

PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.

Self-Adversarial One Step Generation via Condition Shifting

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.

citing papers explorer

Showing 2 of 2 citing papers.

Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs cs.CV · 2026-05-10 · unverdicted · none · ref 49
PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.
Self-Adversarial One Step Generation via Condition Shifting cs.CV · 2026-04-14 · unverdicted · none · ref 27
APEX derives self-adversarial gradients from condition-shifted velocity fields in flow models to achieve high-fidelity one-step generation, outperforming much larger models and multi-step teachers.

Large-scale reinforcement learning for diffusion models

fields

years

verdicts

representative citing papers

citing papers explorer