Intelligent Trainer for Model-Based Reinforcement Learning

Li, Yuanlong; Dong, Linsen; Zhou, Xin; Wen, Yonggang; Guan, Kyle

Computer Science > Machine Learning

arXiv:1805.09496 (cs)

[Submitted on 24 May 2018 (v1), last revised 5 Jun 2019 (this version, v6)]

Title:Intelligent Trainer for Model-Based Reinforcement Learning

Authors:Yuanlong Li, Linsen Dong, Xin Zhou, Yonggang Wen, Kyle Guan

View PDF

Abstract:Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical reinforcement learning (RL), by leveraging a learned model to generate synthesized data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly learning control policy and configuring hyper-parameters (e.g., global/local models, real and synthesized data, etc). The training process could be tedious and prohibitively costly. In this research, we propose an "reinforcement on reinforcement" (RoR) architecture to decompose the convoluted tasks into two layers of reinforcement learning. The inner layer is the canonical model-based RL training process environment (TPE), which learns the control policy for the underlying system and exposes interfaces to access states, actions and rewards. The outer layer presents an RL agent, called as AI trainer, to learn an optimal hyper-parameter configuration for the inner TPE. This decomposition approach provides a desirable flexibility to implement different trainer designs, called as "train the trainer". In our research, we propose and optimize two alternative trainer designs: 1) a uni-head trainer and 2) a multi-head trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym (i.e., Pendulum, Mountain Car, Reacher, Half Cheetah and Swimmer). Compared to three other baseline algorithms, our proposed Train-the-Trainer algorithm has a competitive performance in auto-tuning capability, with upto 56% expected sampling cost saving without knowing the best parameter setting in advance. The proposed trainer framework can be easily extended to other cases in which the hyper-parameter tuning is costly.

Comments:	13 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1805.09496 [cs.LG]
	(or arXiv:1805.09496v6 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.09496

Submission history

From: Yuanlong Li [view email]
[v1] Thu, 24 May 2018 03:08:40 UTC (1,031 KB)
[v2] Tue, 29 May 2018 09:14:20 UTC (1,031 KB)
[v3] Thu, 27 Dec 2018 03:22:35 UTC (1,468 KB)
[v4] Sun, 10 Mar 2019 05:13:36 UTC (3,591 KB)
[v5] Sat, 23 Mar 2019 13:45:03 UTC (3,591 KB)
[v6] Wed, 5 Jun 2019 13:02:28 UTC (3,602 KB)

Computer Science > Machine Learning

Title:Intelligent Trainer for Model-Based Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Intelligent Trainer for Model-Based Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators