DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
hub
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
14 Pith papers cite this work. Polarity classification is still indexing.
abstract
The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
Introduces RAPCs and a contraction Bellman operator for cost-optimal policies that satisfy probabilistic reach-avoid specifications in stochastic MDPs, with almost-sure convergence to local optima.
Model-free RL learns optimal strategies in stochastic games for LTL specs by constructing a product with DPA and assigning rewards/discounts from acceptance conditions.
Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.
Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real robotic arm than behavioral cloning.
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.
SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.
A Goal-Conditioned Decision Transformer is adapted for offline multi-goal RL and shown to outperform online baselines on a new Franka Emika Panda dataset.
Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.
citing papers explorer
-
Learning Optimal Strategies for Temporal Tasks in Stochastic Games
Model-free RL learns optimal strategies in stochastic games for LTL specs by constructing a product with DPA and assigning rewards/discounts from acceptance conditions.