SePPO: Semi-policy preference optimization for diffusion alignment

· 2024 · arXiv 2410.05255

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.

ViPO: Visual Preference Optimization at Scale

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

Poly-DPO improves robustness to noisy preference data in visual models, and the new ViPO dataset enables superior performance, with the method reducing to standard DPO on high-quality data.

Reinforcement Learning for Scalable and Trustworthy Intelligent Systems

cs.LG · 2026-05-08 · unverdicted · novelty 3.0

Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.

citing papers explorer

Showing 3 of 3 citing papers.

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories cs.CV · 2026-04-16 · unverdicted · none · ref 60
LeapAlign fine-tunes flow matching models by constructing two consecutive leaps that skip multiple ODE steps with randomized timesteps and consistency weighting, enabling stable updates at any generation step.
ViPO: Visual Preference Optimization at Scale cs.CV · 2026-04-27 · unverdicted · none · ref 27
Poly-DPO improves robustness to noisy preference data in visual models, and the new ViPO dataset enables superior performance, with the method reducing to standard DPO on high-quality data.
Reinforcement Learning for Scalable and Trustworthy Intelligent Systems cs.LG · 2026-05-08 · unverdicted · none · ref 195
Reinforcement learning is advanced for communication-efficient federated optimization and for preference-aligned, contextually safe policies in large language models.

SePPO: Semi-policy preference optimization for diffusion alignment

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer