Continuous Control With Ensemble Deep Deterministic Policy Gradients

Januszewski, Piotr; Olko, Mateusz; Królikowski, Michał; Świątkowski, Jakub; Andrychowicz, Marcin; Kuciński, Łukasz; Miłoś, Piotr

Computer Science > Machine Learning

arXiv:2111.15382 (cs)

[Submitted on 30 Nov 2021]

Title:Continuous Control With Ensemble Deep Deterministic Policy Gradients

Authors:Piotr Januszewski, Mateusz Olko, Michał Królikowski, Jakub Świątkowski, Marcin Andrychowicz, Łukasz Kuciński, Piotr Miłoś

View PDF

Abstract:The growth of deep reinforcement learning (RL) has brought multiple exciting tools and methods to the field. This rapid expansion makes it important to understand the interplay between individual elements of the RL toolbox. We approach this task from an empirical perspective by conducting a study in the continuous control setting. We present multiple insights of fundamental nature, including: an average of multiple actors trained from the same data boosts performance; the existing methods are unstable across training runs, epochs of training, and evaluation runs; a commonly used additive action noise is not required for effective training; a strategy based on posterior sampling explores better than the approximated UCB combined with the weighted Bellman backup; the weighted Bellman backup alone cannot replace the clipped double Q-Learning; the critics' initialization plays the major role in ensemble-based actor-critic exploration. As a conclusion, we show how existing tools can be brought together in a novel way, giving rise to the Ensemble Deep Deterministic Policy Gradients (ED2) method, to yield state-of-the-art results on continuous control tasks from OpenAI Gym MuJoCo. From the practical side, ED2 is conceptually straightforward, easy to code, and does not require knowledge outside of the existing RL toolbox.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2111.15382 [cs.LG]
	(or arXiv:2111.15382v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2111.15382

Submission history

From: Piotr Januszewski [view email]
[v1] Tue, 30 Nov 2021 13:28:13 UTC (5,783 KB)

Computer Science > Machine Learning

Title:Continuous Control With Ensemble Deep Deterministic Policy Gradients

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Continuous Control With Ensemble Deep Deterministic Policy Gradients

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators