Title resolution pending

Ding, Z · 2023 · arXiv 2309.16984

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.

Reinforcement Learning for Flow-Matching Policies with Density Transport

cs.LG · 2026-06-07 · unverdicted · novelty 7.0

RLDT fine-tunes pretrained flow-matching policies for continuous control by aligning them to a max-entropy RL transport field constructed via SVGD, using expected-target estimation for stable multi-step updates.

Aligning Flow Map Policies with Optimal Q-Guidance

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.

Reinforcement Learning via Value Gradient Flow

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

BFQ enables single-step noise-to-action mapping in offline RL by dividing flow-path displacements into bootstrappable short-range components learned from marginal velocity.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

Diffusion Policy Policy Optimization

cs.RO · 2024-09-01 · unverdicted · novelty 6.0

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Aligning Flow Map Policies with Optimal Q-Guidance cs.LG · 2026-05-12 · unverdicted · none · ref 8
Flow map policies enable fast one-step inference for flow-based RL policies, and FMQ provides an optimal closed-form Q-guided target for offline-to-online adaptation under trust-region constraints, achieving SOTA performance.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unverdicted · none · ref 43
FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer