pith. machine review for the scientific record. sign in

arxiv: 1711.01503 · v1 · submitted 2017-11-04 · 💻 cs.AI

Recognition: unknown

Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.AI
keywords learningpoliciesdeepdynamicslearnedmeta-policypreviouslyalgorithm
0
0 comments X
read the original abstract

Rather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a "meta-policy" from previously learned policies. This paper reports results from experiments using Deep Reinforcement Learning on a continuous-state, discrete-action autonomous driving simulator. We explore how Deep Neural Networks can represent meta-policies that switch among a set of previously learned policies, specifically in settings where the dynamics of a new scenario are composed of a mixture of previously learned dynamics and where the state observation is possibly corrupted by sensing noise. We also report the results of experiments varying dynamics mixes, distractor policies, magnitudes/distributions of sensing noise, and obstacles. In a fully observed experiment, the meta-policy learning algorithm achieves 2.6x the reward achieved by the next best policy composition technique with 80% less exploration. In a partially observed experiment, the meta-policy learning algorithm converges after 50 iterations while a direct application of RL fails to converge even after 200 iterations.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

    cs.LG 2026-05 unverdicted novelty 7.0

    Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.