Albergo, and Yee Whye Teh

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S · 2026 · stat.ML · arXiv 2601.14430

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Controlling generative models is computationally expensive. This is because optimal alignment with a reward function--whether via inference-time steering or fine-tuning--requires estimating the value function. This task demands access to the conditional posterior $p_{1|t}(x_1|x_t)$, the distribution of clean data $x_1$ consistent with an intermediate state $x_t$, a requirement that typically compels methods to resort to costly trajectory simulations. To address this bottleneck, we introduce Meta Flow Maps (MFMs), a framework extending consistency models and flow maps into the stochastic regime. MFMs are trained to perform stochastic one-step posterior sampling, generating arbitrarily many i.i.d. draws of clean data $x_1$ from any intermediate state. Crucially, these samples provide a differentiable reparametrization that unlocks efficient value function estimation. We leverage this capability to solve bottlenecks in both paradigms: enabling inference-time steering without inner rollouts, and facilitating unbiased, off-policy fine-tuning to general rewards. Empirically, our single-particle steered-MFM sampler outperforms a Best-of-1000 baseline on ImageNet across multiple rewards at a fraction of the compute.

citation-role summary

background 3 other 1

citation-polarity summary

background 3 unclear 1

representative citing papers

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

ABC enables any-subset autoregressive generation of continuous stochastic processes via non-Markovian diffusion bridges that track physical time and allow path-dependent conditioning.

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Derives RAM, a reward-adjusted consistency loss extending diffusion pretraining regression to efficient KL-regularized RL post-training, achieving peak rewards up to 50x faster than Flow-GRPO on Stable Diffusion 3.5M.

Measure-to-measure Regression with Transformers

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

Formalizes nonlinear M2M regression and introduces transformer architectures as static maps and dynamic velocity fields between probability measures, tested on synthetic, particle, and organoid datasets.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · 2 refs

citing papers explorer

Showing 5 of 5 citing papers.

Aligning Flow Map Policies with Optimal Q-Guidance cs.LG · 2026-05-12 · unverdicted · none · ref 36 · internal anchor
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space cs.LG · 2026-04-30 · unverdicted · none · ref 48 · internal anchor
ABC enables any-subset autoregressive generation of continuous stochastic processes via non-Markovian diffusion bridges that track physical time and allow path-dependent conditioning.
Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models cs.LG · 2026-05-11 · unverdicted · none · ref 6 · 2 links · internal anchor
Derives RAM, a reward-adjusted consistency loss extending diffusion pretraining regression to efficient KL-regularized RL post-training, achieving peak rewards up to 50x faster than Flow-GRPO on Stable Diffusion 3.5M.
Measure-to-measure Regression with Transformers cs.LG · 2026-05-27 · unverdicted · none · ref 21 · internal anchor
Formalizes nonlinear M2M regression and introduces transformer architectures as static maps and dynamic velocity fields between probability measures, tested on synthetic, particle, and organoid datasets.
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance cs.LG · 2026-04-29 · unreviewed · ref 53 · 2 links · internal anchor

Albergo, and Yee Whye Teh

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer