ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Jesson, Andrew; Lu, Chris; Gupta, Gunshi; Filos, Angelos; Foerster, Jakob Nicolaus; Gal, Yarin

Computer Science > Machine Learning

arXiv:2306.01460v1 (cs)

[Submitted on 2 Jun 2023 (this version), latest version 24 Nov 2023 (v3)]

Title:ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Authors:Andrew Jesson, Chris Lu, Gunshi Gupta, Angelos Filos, Jakob Nicolaus Foerster, Yarin Gal

View PDF

Abstract:In this paper, we introduce a novel method for enhancing the effectiveness of on-policy Deep Reinforcement Learning (DRL) algorithms. Current on-policy algorithms, such as Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C), do not sufficiently account for cautious interaction with the environment. Our method addresses this gap by explicitly integrating cautious interaction in two critical ways: by maximizing a lower-bound on the true value function plus a constant, thereby promoting a \textit{conservative value estimation}, and by incorporating Thompson sampling for cautious exploration. These features are realized through three surprisingly simple modifications to the A3C algorithm: processing advantage estimates through a ReLU function, spectral normalization, and dropout. We provide theoretical proof that our algorithm maximizes the lower bound, which also grounds Regret Matching Policy Gradients (RMPG), a discrete-action on-policy method for multi-agent reinforcement learning. Our rigorous empirical evaluations across various benchmarks consistently demonstrates our approach's improved performance against existing on-policy algorithms. This research represents a substantial step towards more cautious and effective DRL algorithms, which has the potential to unlock application to complex, real-world problems.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2306.01460 [cs.LG]
	(or arXiv:2306.01460v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.01460

Submission history

From: Andrew Jesson D [view email]
[v1] Fri, 2 Jun 2023 11:37:22 UTC (9,839 KB)
[v2] Mon, 12 Jun 2023 18:49:29 UTC (10,070 KB)
[v3] Fri, 24 Nov 2023 22:31:07 UTC (18,797 KB)

Computer Science > Machine Learning

Title:ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators