Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
read the original abstract
The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.
This paper has not been read by Pith yet.
Forward citations
Cited by 16 Pith papers
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous ...
-
Solving Rubik's Cube with a Robot Hand
Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.
-
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
Ada-Diffuser is a causal diffusion model that jointly learns observed interaction structure and underlying latent dynamics from minimal observations for adaptive planning and policy learning.
-
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Introduces RAPCs and a contraction Bellman operator for cost-optimal policies that satisfy probabilistic reach-avoid specifications in stochastic MDPs, with almost-sure convergence to local optima.
-
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning
Introduces RAPCs and a contraction Bellman operator that jointly enforce probabilistic reach-avoid constraints while minimizing expected costs in stochastic RL, with almost-sure convergence to local optima.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markov...
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer achieves state-of-the-art results in offline goal-conditioned RL by replacing return-to-go with a state-conditioned Q-estimator and introducing a gated hybrid attention-mamba backbone for content-adaptive histor...
-
Learning Optimal Strategies for Temporal Tasks in Stochastic Games
Model-free RL learns optimal strategies in stochastic games for LTL specs by constructing a product with DPA and assigning rewards/discounts from acceptance conditions.
-
Disentangled Skill Embeddings for Reinforcement Learning
Disentangled Skill Embeddings (DSE) is a variational inference framework for multi-task RL using shared parameters and task-specific latent embeddings for generalization to unseen conditions and as skills in hierarchical RL.
-
Trajectory-Level Data Augmentation for Offline Reinforcement Learning
Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.
-
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Robots outperform constrained human demonstrations by inferring state-only rewards from demos and using temporal interpolation to label and explore better trajectories, achieving 10x faster task completion on a real r...
-
D2 Actor Critic: Diffusion Actor Meets Distributional Critic
D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.
-
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.
-
Goal-Conditioned Decision Transformer for Multi-Goal Offline Reinforcement Learning
A Goal-Conditioned Decision Transformer is adapted for offline multi-goal RL and shown to outperform online baselines on a new Franka Emika Panda dataset.
-
Middle-mile logistics through the lens of goal-conditioned reinforcement learning
Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.