GPU-Parallel Multi-Task Reinforcement Learning with Demonstration Guided Policy Optimization

Junjie Lai; Qiwei Wu; Renjing Xu; Rui Zhang; Tao Li; Weihua Zhang; Yunrong Guo; Zhengyu Zhang

arxiv: 2606.03335 · v1 · pith:RRAV4LVInew · submitted 2026-06-02 · 💻 cs.RO

GPU-Parallel Multi-Task Reinforcement Learning with Demonstration Guided Policy Optimization

Rui Zhang , Qiwei Wu , Zhengyu Zhang , Tao Li , Yunrong Guo , Junjie Lai , Renjing Xu , Weihua Zhang This is my paper

classification 💻 cs.RO

keywords taskdemonstrationgpu-parallellearningreinforcementdgpoguidedmulti-task

0 comments

read the original abstract

Large scale GPU-parallel reinforcement learning has changed what can be trained in robot simulation, yet most systems still optimize one specialist policy per task. We propose a construction methodology for turning structured manipulation task families into GPU-parallel multi-task RL benchmarks, and instantiate it as MT-Libero using LIBERO assets and task predicates in Isaac Lab. The resulting benchmark supports simultaneous reinforcement learning over heterogeneous task suites with parallel rendering, physics randomization, and state-input or visual-input policies. To make such training practical under sparse success signals and limited prior data, we further propose DGPO, an on-policy demonstration guided method that combines importance weighted PPO with adaptive behavior cloning on matched demonstration actions. DGPO enables a tunable preference toward demonstrated task distributions, outperforming both prior-free RL and existing demonstration-based methods while preserving the stability and online improvement benefits of on-policy PPO.

This paper has not been read by Pith yet.

GPU-Parallel Multi-Task Reinforcement Learning with Demonstration Guided Policy Optimization

discussion (0)