Residual policy learning

Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling · 2018 · arXiv 1812.06298

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

A two-stage framework augments HOI data with dynamic priors and blends pre-trained dynamic motion and static interaction agents via a composer network to enable long-term dynamic human-object interactions with higher success rates and reduced training time.

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

cs.RO · 2026-04-20 · unverdicted · novelty 6.0

AnchorRefine factorizes VLA action generation into a trajectory anchor for coarse planning and residual refinement for local corrections, improving success rates by up to 7.8% in simulation and 18% on real robots across LIBERO, CALVIN, and physical tasks.

Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift

cs.LG · 2026-04-30 · unverdicted · novelty 5.0

JEPA-Indexed Local Expert Growth adds local action corrections for detected shift clusters and yields statistically significant OOD gains on four shift conditions while keeping in-distribution performance intact.

Incremental Residual Reinforcement Learning Toward Real-World Learning for Social Navigation

cs.RO · 2026-04-09 · unverdicted · novelty 5.0

IRRL lets robots learn social navigation in the real world by incrementally updating only the differences from a base policy, matching replay-buffer methods in simulation and adapting to new settings on physical robots.

citing papers explorer

Showing 6 of 6 citing papers.

Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers cs.CV · 2026-05-12 · unverdicted · none · ref 47
A two-stage framework augments HOI data with dynamic priors and blends pre-trained dynamic motion and static interaction agents via a composer network to enable long-term dynamic human-object interactions with higher success rates and reduced training time.
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning cs.RO · 2026-05-06 · unverdicted · none · ref 52
Q2RL extracts Q-functions from BC policies via minimal interactions and applies Q-gating to enable stable offline-to-online RL, outperforming baselines on manipulation benchmarks and achieving up to 100% success on-robot.
Fisher Decorator: Refining Flow Policy via a Local Transport Map cs.LG · 2026-04-20 · unverdicted · none · ref 68
Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.
AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models cs.RO · 2026-04-20 · unverdicted · none · ref 40
AnchorRefine factorizes VLA action generation into a trajectory anchor for coarse planning and residual refinement for local corrections, improving success rates by up to 7.8% in simulation and 18% on real robots across LIBERO, CALVIN, and physical tasks.
Detecting is Easy, Adapting is Hard: Local Expert Growth for Visual Model-Based Reinforcement Learning under Distribution Shift cs.LG · 2026-04-30 · unverdicted · none · ref 9
JEPA-Indexed Local Expert Growth adds local action corrections for detected shift clusters and yields statistically significant OOD gains on four shift conditions while keeping in-distribution performance intact.
Incremental Residual Reinforcement Learning Toward Real-World Learning for Social Navigation cs.RO · 2026-04-09 · unverdicted · none · ref 12
IRRL lets robots learn social navigation in the real world by incrementally updating only the differences from a base policy, matching replay-buffer methods in simulation and adapting to new settings on physical robots.

Residual policy learning

fields

years

verdicts

representative citing papers

citing papers explorer