ConQUR: Mitigating Delusional Bias in Deep Q-learning

Su, Andy; Ooi, Jayden; Lu, Tyler; Schuurmans, Dale; Boutilier, Craig

Computer Science > Machine Learning

arXiv:2002.12399 (cs)

[Submitted on 27 Feb 2020]

Title:ConQUR: Mitigating Delusional Bias in Deep Q-learning

Authors:Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

View PDF

Abstract:Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2002.12399 [cs.LG]
	(or arXiv:2002.12399v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2002.12399

Submission history

From: Jayden Ooi [view email]
[v1] Thu, 27 Feb 2020 19:22:51 UTC (7,324 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-02

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jayden Ooi
Tyler Lu
Dale Schuurmans
Craig Boutilier

export BibTeX citation

Computer Science > Machine Learning

Title:ConQUR: Mitigating Delusional Bias in Deep Q-learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:ConQUR: Mitigating Delusional Bias in Deep Q-learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators