Nonlinear Distributional Gradient Temporal-Difference Learning

Qu, Chao; Mannor, Shie; Xu, Huan

Computer Science > Machine Learning

arXiv:1805.07732 (cs)

[Submitted on 20 May 2018 (v1), last revised 3 Apr 2019 (this version, v3)]

Title:Nonlinear Distributional Gradient Temporal-Difference Learning

Authors:Chao Qu, Shie Mannor, Huan Xu

View PDF

Abstract:We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the policy evaluation setting, we design two new algorithms called distributional GTD2 and distributional TDC using the Cram{é}r distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. In the control setting, we propose the distributional Greedy-GQ using the similar derivation. We prove the asymptotic almost-sure convergence of distributional GTD2 and TDC to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexities of above three algorithms are linear w.r.t.\ the number of the parameters of the function approximator, thus can be implemented efficiently for neural networks.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1805.07732 [cs.LG]
	(or arXiv:1805.07732v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.07732

Submission history

From: Chao Qu [view email]
[v1] Sun, 20 May 2018 08:43:05 UTC (40 KB)
[v2] Sun, 27 Jan 2019 04:58:44 UTC (745 KB)
[v3] Wed, 3 Apr 2019 03:38:05 UTC (750 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-05

Change to browse by:

cs
cs.AI
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Chao Qu
Shie Mannor
Huan Xu

export BibTeX citation

Computer Science > Machine Learning

Title:Nonlinear Distributional Gradient Temporal-Difference Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Nonlinear Distributional Gradient Temporal-Difference Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators