Successive Over Relaxation Q-Learning

Kamanchi, Chandramouli; Diddigi, Raghuram Bharadwaj; Bhatnagar, Shalabh

Computer Science > Machine Learning

arXiv:1903.03812v1 (cs)

[Submitted on 9 Mar 2019 (this version), latest version 13 Jun 2019 (v3)]

Title:Successive Over Relaxation Q-Learning

Authors:Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

View PDF

Abstract:In a discounted reward Markov Decision Process (MDP) the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and fixed point iteration scheme known as value iteration is utilized to obtain the solution. In [1], a successive over relaxation value iteration scheme is proposed to speed up the computation of the optimal value function. They propose a modified Bellman equation and prove the faster convergence to the optimal value function. However, in many practical applications, the model information is not known and we resort to Reinforcement Learning (RL) algorithms to obtain optimal policy and value function. One such popular algorithm is Q-Learning. In this paper, we propose Successive Over Relaxation (SOR) Q-Learning. We first derive a fixed point iteration for optimal Q-values based on [1] and utilize the stochastic approximation scheme to derive a learning algorithm to compute the optimal value function and an optimal policy. We then prove the convergence of the SOR Q-Learning to optimal Q-values. Finally, through numerical experiments, we show that SOR Q-Learning is faster compared to the Q-Learning algorithm.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1903.03812 [cs.LG]
	(or arXiv:1903.03812v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.03812

Submission history

From: Chandramouli Kamanchi [view email]
[v1] Sat, 9 Mar 2019 15:03:18 UTC (180 KB)
[v2] Fri, 15 Mar 2019 18:38:53 UTC (186 KB)
[v3] Thu, 13 Jun 2019 18:49:22 UTC (245 KB)

Computer Science > Machine Learning

Title:Successive Over Relaxation Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Successive Over Relaxation Q-Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators