Multi-goal reinforcement learn- ing: Challenging robotics environments and request for research.arXiv preprint arXiv:1802.09464

Plappert, M · 2018 · cs.LG · arXiv 1802.09464

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.

representative citing papers

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.

Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Introduces RAPCs and a contraction Bellman operator that jointly enforce probabilistic reach-avoid constraints while minimizing expected costs in stochastic RL, with almost-sure convergence to local optima.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Trajectory-Level Data Augmentation for Offline Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 5.0

Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.

Middle-mile logistics through the lens of goal-conditioned reinforcement learning

stat.ML · 2026-05-04 · unverdicted · novelty 4.0

Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.

citing papers explorer

Showing 5 of 5 citing papers.

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic cs.LG · 2026-05-09 · unverdicted · none · ref 39
A new marginalized reparameterization estimator allows low-variance training of mixture policies in entropy-regularized actor-critic algorithms, matching or exceeding Gaussian policy performance in several continuous control benchmarks.
Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 7
Introduces RAPCs and a contraction Bellman operator that jointly enforce probabilistic reach-avoid constraints while minimizing expected costs in stochastic RL, with almost-sure convergence to local optima.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 152
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Trajectory-Level Data Augmentation for Offline Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
Trajectory-based data augmentation exploits geometric relationships between rewards, values, and logging policies to enable effective offline RL from few suboptimal trajectories.
Middle-mile logistics through the lens of goal-conditioned reinforcement learning stat.ML · 2026-05-04 · unverdicted · none · ref 15
Middle-mile logistics is cast as a multi-object goal-conditioned MDP and solved by combining graph neural networks with model-free RL via extraction of small feature graphs.

Multi-goal reinforcement learn- ing: Challenging robotics environments and request for research.arXiv preprint arXiv:1802.09464

fields

years

verdicts

representative citing papers

citing papers explorer