pith. machine review for the scientific record. sign in

arxiv: 1802.09464 · v2 · submitted 2018-02-26 · 💻 cs.LG · cs.AI· cs.RO

Recognition: unknown

Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research

Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Marcin Andrychowicz, Matthias Plappert, Peter Welinder, Vikash Kumar, Wojciech Zaremba

classification 💻 cs.LG cs.AIcs.RO
keywords multi-goaltaskschallenginglearningreinforcementresearchroboticsadditional
0
0 comments X
read the original abstract

The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

    cs.LG 2026-05 unverdicted novelty 7.0

    A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous ...

  2. Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 6.0

    Introduces RAPCs and a contraction Bellman operator that jointly enforce probabilistic reach-avoid constraints while minimizing expected costs in stochastic RL, with almost-sure convergence to local optima.

  3. QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

    cs.LG 2026-05 unverdicted novelty 6.0

    QHyer achieves state-of-the-art results in offline goal-conditioned RL by replacing return-to-go with a state-conditioned Q-estimator and introducing a gated hybrid attention-mamba backbone for content-adaptive histor...

  4. QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

    cs.LG 2026-05 unverdicted novelty 6.0

    QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markov...

  5. Trajectory-Level Data Augmentation for Offline Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 5.0

    Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.

  6. Middle-mile logistics through the lens of goal-conditioned reinforcement learning

    stat.ML 2026-05 unverdicted novelty 4.0

    Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.