Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Barkley, Brett; Fridovich-Keil, David

Abstract:Dyna-style off-policy model-based reinforcement learning (DMBRL) algorithms are a family of techniques for generating synthetic state transition data and thereby enhancing the sample efficiency of off-policy RL algorithms. This paper identifies and investigates a surprising performance gap observed when applying DMBRL algorithms across different benchmark environments with proprioceptive observations. We show that, while DMBRL algorithms perform well in OpenAI Gym, their performance can drop significantly in DeepMind Control Suite (DMC), even though these settings offer similar tasks and identical physics backends. Modern techniques designed to address several key issues that arise in these settings do not provide a consistent improvement across all environments, and overall our results show that adding synthetic rollouts to the training process -- the backbone of Dyna-style algorithms -- significantly degrades performance across most DMC environments. Our findings contribute to a deeper understanding of several fundamental challenges in model-based RL and show that, like many optimization fields, there is no free lunch when evaluating performance across diverse benchmarks in RL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2412.14312 [cs.LG]
	(or arXiv:2412.14312v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.14312

Computer Science > Machine Learning

Title:Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators