Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.
Energy-weighted flow matching for offline reinforcement learning
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.
DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.
GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
EnFlow integrates flow-based conformer generation with energy landscape modeling to enable joint ensemble generation and ground-state identification using only 1-2 ODE steps.
Energy-Weighted Flow Matching reformulates conditional flow matching with importance sampling to enable continuous normalizing flows to model Boltzmann distributions from energy evaluations alone, with iterative and annealed variants showing competitive performance on benchmarks.
citing papers explorer
-
Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling
Energy-Weighted Flow Matching reformulates conditional flow matching with importance sampling to enable continuous normalizing flows to model Boltzmann distributions from energy evaluations alone, with iterative and annealed variants showing competitive performance on benchmarks.