Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026

Ankur Moitra, Andrej Risteski, Dhruv Rohatgi · 2026 · arXiv 2602.16570

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · unverdicted · novelty 8.0 · 3 refs

FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.

Are we really tilting? The mechanics of reward guidance in flow and diffusion models

cs.LG · 2026-06-01 · unverdicted · novelty 6.0

Finite-particle approximation of the Doob h-function causes reward hacking via two failure modes in reward-guided diffusion; a damping schedule corrects within-mode bias in Gaussian settings.

Technical Note on Relating Scores of Tilted Distributions

math.ST · 2026-04-29 · unverdicted · novelty 4.0

Extends score relations for tilted distributions to constant negative diagonal tilts by linking denoisers via Tweedie's formula, yielding location and time shifts in the score operator.

citing papers explorer

Showing 3 of 3 citing papers.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance cs.LG · 2026-04-29 · unverdicted · none · ref 16 · 3 links
FMRG reformulates guidance as deterministic optimal control, deriving a single-trajectory method using the flow map that matches or exceeds baselines on reward-guided generation and inverse problems with 3 NFEs at text-to-image scale.
Are we really tilting? The mechanics of reward guidance in flow and diffusion models cs.LG · 2026-06-01 · unverdicted · none · ref 46
Finite-particle approximation of the Doob h-function causes reward hacking via two failure modes in reward-guided diffusion; a damping schedule corrects within-mode bias in Gaussian settings.
Technical Note on Relating Scores of Tilted Distributions math.ST · 2026-04-29 · unverdicted · none · ref 4
Extends score relations for tilted distributions to constant negative diagonal tilts by linking denoisers via Tweedie's formula, yielding location and time shifts in the score operator.

Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer