DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Arakawa, Riku; Kobayashi, Sosuke; Unno, Yuya; Tsuboi, Yuta; Maeda, Shin-ichi

Computer Science > Human-Computer Interaction

arXiv:1810.11748 (cs)

[Submitted on 28 Oct 2018]

Title:DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Authors:Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, Shin-ichi Maeda

View PDF

Abstract:Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze.

Subjects:	Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Cite as:	arXiv:1810.11748 [cs.HC]
	(or arXiv:1810.11748v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.1810.11748

Submission history

From: Sosuke Kobayashi [view email]
[v1] Sun, 28 Oct 2018 02:18:40 UTC (2,339 KB)

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Human-Computer Interaction

Title:DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

🚨2024-09-29: arxiv.org is experience DB issues. The announce tonight will be 3 hours later than usual.🚨

Computer Science > Human-Computer Interaction

Title:DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators