pith. sign in

arxiv: 1809.02925 · v2 · pith:42ELHNGZnew · submitted 2018-09-09 · 💻 cs.LG · stat.ML

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

classification 💻 cs.LG stat.ML
keywords algorithmslearningrewardadversarialalgorithmbiasdiscriminator-actor-criticexpert
0
0 comments X
read the original abstract

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any task-specific adjustments.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Direct Preference Optimization for Primitive-Enabled Hierarchical RL: A Bilevel Approach

    cs.LG 2024-11 unverdicted novelty 6.0

    DIPPER uses bi-level optimization and DPO to train the higher-level policy from stationary preference comparisons and value regularization, claiming up to 40% gains on robotic navigation and manipulation tasks while i...