DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Riku Arakawa; Shin-ichi Maeda; Sosuke Kobayashi; Yuta Tsuboi; Yuya Unno

arxiv: 1810.11748 · v1 · pith:HCPGADKWnew · submitted 2018-10-28 · 💻 cs.HC · cs.LG

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Riku Arakawa , Sosuke Kobayashi , Yuya Unno , Yuta Tsuboi , Shin-ichi Maeda This is my paper

classification 💻 cs.HC cs.LG

keywords agentfeedbackdqn-tamerhumanhuman-in-the-looprewardsactionsapplication

0 comments

read the original abstract

Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
eess.SY 2025-08 unverdicted novelty 2.0

A literature review of safe RL using Lyapunov and barrier functions that identifies a shift to model-free methods since 2017, well-defined open problems per approach class, and high-dimensional scalability as the main...