The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Patel, Nishil; Lee, Sebastian; Mannelli, Stefano Sarao; Goldt, Sebastian; Saxe, Andrew

Computer Science > Machine Learning

arXiv:2306.10404 (cs)

[Submitted on 17 Jun 2023 (v1), last revised 2 Sep 2023 (this version, v5)]

Title:The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Authors:Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Andrew Saxe

View PDF

Abstract:Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.

Comments:	10 pages, 7 figures, Preprint
Subjects:	Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Cite as:	arXiv:2306.10404 [cs.LG]
	(or arXiv:2306.10404v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.10404

Submission history

From: Nishil Patel [view email]
[v1] Sat, 17 Jun 2023 18:16:51 UTC (28,698 KB)
[v2] Wed, 21 Jun 2023 16:38:04 UTC (28,698 KB)
[v3] Tue, 27 Jun 2023 10:37:55 UTC (28,698 KB)
[v4] Wed, 19 Jul 2023 09:17:09 UTC (28,698 KB)
[v5] Sat, 2 Sep 2023 14:24:52 UTC (28,698 KB)

Computer Science > Machine Learning

Title:The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators