Reinforcement Learning Algorithm Selection

Laroche, Romain; Feraud, Raphael

Statistics > Machine Learning

arXiv:1701.08810 (stat)

[Submitted on 30 Jan 2017 (v1), last revised 14 Nov 2017 (this version, v3)]

Title:Reinforcement Learning Algorithm Selection

Authors:Romain Laroche, Raphael Feraud

View PDF

Abstract:This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.

Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:1701.08810 [stat.ML]
	(or arXiv:1701.08810v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1701.08810

Submission history

From: Romain Laroche [view email]
[v1] Mon, 30 Jan 2017 20:13:17 UTC (548 KB)
[v2] Fri, 2 Jun 2017 19:20:40 UTC (5,423 KB)
[v3] Tue, 14 Nov 2017 21:08:17 UTC (5,537 KB)

Statistics > Machine Learning

Title:Reinforcement Learning Algorithm Selection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Reinforcement Learning Algorithm Selection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators