pith. machine review for the scientific record. sign in

arxiv: 2602.05993 · v2 · submitted 2026-02-05 · 💻 cs.LG · cs.AI

Recognition: unknown

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Douglas Chen, Giri Anantharaman, Ishin Shah, Luca Eyring, Max Simchowitz, Nicholas Matthew Boffi, Peter Holderrieth, Tommi Jaakkola, Yutong He, Zeynep Akata

classification 💻 cs.LG cs.AI
keywords alignmentmapsrewarddiamondefficientflowmodelsarbitrary
0
0 comments X
read the original abstract

Flow and diffusion models produce high-quality samples, but adapting them to user preferences or constraints post-training remains costly and brittle, a challenge commonly called reward alignment. We argue that efficient reward alignment should be a property of the generative model itself, not an afterthought, and redesign the model for adaptability. We propose "Diamond Maps", stochastic flow map models that enable efficient and accurate alignment to arbitrary rewards at inference time. Diamond Maps amortize many simulation steps into a single-step sampler, like flow maps, while preserving the stochasticity required for optimal reward alignment. This design makes search, Sequential Monte Carlo, and guidance scalable by enabling efficient and consistent estimation of the value function. Our experiments show that Diamond Maps can be learned efficiently via distillation from GLASS Flows, achieve stronger reward alignment performance, and scale better than existing methods. Our results point toward a practical route to generative models that can be rapidly adapted to arbitrary preferences and constraints at inference time.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Aligning Flow Map Policies with Optimal Q-Guidance

    cs.LG 2026-05 unverdicted novelty 7.0

    Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

  2. Follow the Mean: Reference-Guided Flow Matching

    cs.LG 2026-05 unverdicted novelty 7.0

    Flow matching admits reference-guided control by shifting the conditional endpoint mean, enabling training-free steering of models like FLUX via example banks and a semi-parametric variant on DiT.

  3. Follow the Mean: Reference-Guided Flow Matching

    cs.LG 2026-05 unverdicted novelty 7.0

    Flow matching admits controllable generation by shifting the conditional endpoint mean computed from a reference set, enabling training-free guidance on frozen pretrained models.

  4. Stochastic Transition-Map Distillation for Fast Probabilistic Inference

    cs.LG 2026-05 unverdicted novelty 7.0

    STMD distills the full transition map of diffusion sampling SDEs into a conditional Mean Flow model to enable fast one- or few-step stochastic sampling without teacher models or bi-level optimization.

  5. How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

    cs.LG 2026-04 unverdicted novelty 7.0

    FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.