FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Finite-particle approximation of the Doob h-function causes reward hacking via two failure modes in reward-guided diffusion; a damping schedule corrects within-mode bias in Gaussian settings.
Extends score relations for tilted distributions to constant negative diagonal tilts by linking denoisers via Tweedie's formula, yielding location and time shifts in the score operator.
citing papers explorer
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
-
Are we really tilting? The mechanics of reward guidance in flow and diffusion models
Finite-particle approximation of the Doob h-function causes reward hacking via two failure modes in reward-guided diffusion; a damping schedule corrects within-mode bias in Gaussian settings.
-
Technical Note on Relating Scores of Tilted Distributions
Extends score relations for tilted distributions to constant negative diagonal tilts by linking denoisers via Tweedie's formula, yielding location and time shifts in the score operator.