Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Steckelmacher, Denis; Plisnier, Hélène; Roijers, Diederik M.; Nowé, Ann

Computer Science > Machine Learning

arXiv:1903.04193 (cs)

[Submitted on 11 Mar 2019 (v1), last revised 12 Jun 2019 (this version, v2)]

Title:Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Authors:Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé

View PDF

Abstract:Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: this https URL.

Comments:	Accepted at the European Conference on Machine Learning 2019 (ECML)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:1903.04193 [cs.LG]
	(or arXiv:1903.04193v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.04193

Submission history

From: Denis Steckelmacher [view email]
[v1] Mon, 11 Mar 2019 09:59:58 UTC (170 KB)
[v2] Wed, 12 Jun 2019 13:49:50 UTC (688 KB)

Computer Science > Machine Learning

Title:Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators