Backplay: "Man muss immer umkehren"

Resnick, Cinjon; Raileanu, Roberta; Kapoor, Sanyam; Peysakhovich, Alex; Cho, Kyunghyun; Bruna, Joan

Computer Science > Machine Learning

arXiv:1807.06919v2 (cs)

[Submitted on 18 Jul 2018 (v1), revised 5 Aug 2018 (this version, v2), latest version 21 Apr 2022 (v5)]

Title:Backplay: "Man muss immer umkehren"

Authors:Cinjon Resnick, Roberta Raileanu, Sanyam Kapoor, Alex Peysakhovich, Kyunghyun Cho, Joan Bruna

View PDF

Abstract:A long-standing problem in model free reinforcement learning (RL) is that it requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to increase the sample efficiency of RL when we have access to demonstrations. Our approach, which we call Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. We perform experiments in a competitive four player game (Pommerman) and a path-finding maze game. We find that this weak form of guidance provides significant gains in sample complexity with a stark advantage in sparse reward environments. In some cases, standard RL did not yield any improvement while Backplay reached success rates greater than 50% and generalized to unseen initial conditions in the same amount of training time. Additionally, we see that agents trained via Backplay can learn policies superior to those of the original demonstration.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1807.06919 [cs.LG]
	(or arXiv:1807.06919v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1807.06919

Submission history

From: Cinjon Resnick [view email]
[v1] Wed, 18 Jul 2018 13:28:59 UTC (2,202 KB)
[v2] Sun, 5 Aug 2018 21:09:36 UTC (3,811 KB)
[v3] Fri, 28 Sep 2018 20:13:45 UTC (2,846 KB)
[v4] Mon, 31 Dec 2018 15:16:18 UTC (3,792 KB)
[v5] Thu, 21 Apr 2022 14:03:32 UTC (3,792 KB)

Computer Science > Machine Learning

Title:Backplay: "Man muss immer umkehren"

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Backplay: "Man muss immer umkehren"

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators