Taming the Noise in Reinforcement Learning via Soft Updates

Fox, Roy; Pakman, Ari; Tishby, Naftali

Computer Science > Machine Learning

arXiv:1512.08562 (cs)

[Submitted on 28 Dec 2015 (v1), last revised 30 Mar 2017 (this version, v4)]

Title:Taming the Noise in Reinforcement Learning via Soft Updates

Authors:Roy Fox, Ari Pakman, Naftali Tishby

View PDF

Abstract:Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT)
Cite as:	arXiv:1512.08562 [cs.LG]
	(or arXiv:1512.08562v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1512.08562
Journal reference:	32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016)

Submission history

From: Roy Fox [view email]
[v1] Mon, 28 Dec 2015 23:59:12 UTC (841 KB)
[v2] Wed, 25 May 2016 20:33:03 UTC (787 KB)
[v3] Mon, 23 Jan 2017 18:21:49 UTC (787 KB)
[v4] Thu, 30 Mar 2017 05:00:30 UTC (787 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2015-12

Change to browse by:

cs
cs.IT
math
math.IT

References & Citations

DBLP - CS Bibliography

listing | bibtex

Roy Fox
Ari Pakman
Naftali Tishby

export BibTeX citation

Computer Science > Machine Learning

Title:Taming the Noise in Reinforcement Learning via Soft Updates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Taming the Noise in Reinforcement Learning via Soft Updates

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators