Energy-weighted flow matching for offline reinforcement learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu · 2025 · arXiv 2503.04975

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery

cs.LG · 2025-12-27 · unverdicted · novelty 6.0

EnFlow integrates flow-based conformer generation with energy landscape modeling to enable joint ensemble generation and ground-state identification using only 1-2 ODE steps.

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

stat.ML · 2025-09-03 · unverdicted · novelty 6.0

Energy-Weighted Flow Matching reformulates conditional flow matching with importance sampling to enable continuous normalizing flows to model Boltzmann distributions from energy evaluations alone, with iterative and annealed variants showing competitive performance on benchmarks.

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

cs.LG · 2026-05-07

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03

citing papers explorer

Showing 9 of 9 citing papers.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models cs.LG · 2026-05-12 · unverdicted · none · ref 73 · 2 links
Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching cs.RO · 2026-04-13 · unverdicted · none · ref 25
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
Steering Your Diffusion Policy with Latent Space Reinforcement Learning cs.RO · 2025-06-18 · unverdicted · none · ref 44
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models cs.LG · 2026-05-28 · unverdicted · none · ref 61
GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 9
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery cs.LG · 2025-12-27 · unverdicted · none · ref 77
EnFlow integrates flow-based conformer generation with energy landscape modeling to enable joint ensemble generation and ground-state identification using only 1-2 ODE steps.
Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling stat.ML · 2025-09-03 · unverdicted · none · ref 41
Energy-Weighted Flow Matching reformulates conditional flow matching with importance sampling to enable continuous normalizing flows to model Boltzmann distributions from energy evaluations alone, with iterative and annealed variants showing competitive performance on benchmarks.
Path-Coupled Bellman Flows for Distributional Reinforcement Learning cs.LG · 2026-05-07 · unreviewed · ref 36
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unreviewed · ref 74

Energy-weighted flow matching for offline reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer