PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Hämäläinen, Perttu; Babadi, Amin; Ma, Xiaoxiao; Lehtinen, Jaakko

Computer Science > Machine Learning

arXiv:1810.02541v3 (cs)

[Submitted on 5 Oct 2018 (v1), revised 18 Dec 2018 (this version, v3), latest version 3 Nov 2020 (v9)]

Title:PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Authors:Perttu Hämäläinen, Amin Babadi, Xiaoxiao Ma, Jaakko Lehtinen

View PDF

Abstract:Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, in continuous state and actions spaces and a Gaussian policy -- common in computer animation and robotics -- PPO is prone to getting stuck in local optima. In this paper, we observe a tendency of PPO to prematurely shrink the exploration variance, which naturally leads to slow progress. Motivated by this, we borrow ideas from CMA-ES, a black-box optimization method designed for intelligent adaptive Gaussian exploration, to derive PPO-CMA, a novel proximal policy optimization approach that can expand the exploration variance on objective function slopes and shrink the variance when close to the optimum. This is implemented by using separate neural networks for policy mean and variance and training the mean and variance in separate passes. Our experiments demonstrate a clear improvement over vanilla PPO in many difficult OpenAI Gym MuJoCo tasks.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1810.02541 [cs.LG]
	(or arXiv:1810.02541v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1810.02541

Submission history

From: Amin Babadi [view email]
[v1] Fri, 5 Oct 2018 06:59:29 UTC (3,580 KB)
[v2] Mon, 8 Oct 2018 07:57:04 UTC (3,759 KB)
[v3] Tue, 18 Dec 2018 09:24:26 UTC (7,226 KB)
[v4] Wed, 16 Jan 2019 09:29:44 UTC (4,696 KB)
[v5] Wed, 23 Jan 2019 21:47:31 UTC (6,301 KB)
[v6] Fri, 24 May 2019 09:16:37 UTC (3,120 KB)
[v7] Tue, 27 Aug 2019 07:34:01 UTC (3,121 KB)
[v8] Mon, 3 Aug 2020 07:19:28 UTC (2,427 KB)
[v9] Tue, 3 Nov 2020 07:51:49 UTC (2,427 KB)

Computer Science > Machine Learning

Title:PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators