Meta Learning Shared Hierarchies

John Schulman; Jonathan Ho; Kevin Frans; Pieter Abbeel; Xi Chen

arxiv: 1710.09767 · v1 · pith:QMZO4XRSnew · submitted 2017-10-26 · 💻 cs.LG

Meta Learning Shared Hierarchies

Kevin Frans , Jonathan Ho , Xi Chen , Pieter Abbeel , John Schulman This is my paper

classification 💻 cs.LG

keywords taskslearningpoliciesprimitivessharedhierarchiesproblemrobots

0 comments

read the original abstract

We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks
cs.LG 2025-02 unverdicted novelty 6.0

TAVT improves OOD task generalization in meta-RL by preserving task characteristics in virtual tasks via metric learning and using state regularization.
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
cs.LG 2019-06 unverdicted novelty 6.0

RL policies decompose into information-regularized primitives that compete by requesting state information amounts, with the greediest one acting, yielding better generalization than flat or hierarchical baselines.
On mechanisms for transfer using landmark value functions in multi-task lifelong reinforcement learning
cs.LG 2019-07 unverdicted novelty 5.0

Landmark topological coverings derived from traversibility metrics enable three transfer mechanisms with theoretical Q-value bounds in goal-based multi-task lifelong RL.
Learning to Cope with Adversarial Attacks
cs.LG 2019-06 unverdicted novelty 5.0

MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.
Neural Embedding for Physical Manipulations
cs.LG 2019-07 unverdicted novelty 4.0

Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.