Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Karmakar, Prasenjit; Bhatnagar, Shalabh

Mathematics > Dynamical Systems

arXiv:1503.09105v9 (math)

[Submitted on 31 Mar 2015 (v1), revised 21 Mar 2016 (this version, v9), latest version 25 Feb 2017 (v14)]

Title:Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Authors:Prasenjit Karmakar, Shalabh Bhatnagar

View PDF

Abstract:We present for the first time an asymptotic convergence analysis of two-timescale stochastic approximation driven by controlled Markov noise. In particular, both the faster and slower recursions have non-additive Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the invariant probability measures associated with the controlled Markov processes. Finally, we show how to solve the off-policy convergence problem for temporal difference learning with linear function approximation using our results and proving stability of the iterates in this case. Moreover, in general, we emphasize the fact that all the reinforcement learning scenarios where function approximation of value function is deployed needs to consider Markov noise in the convergence proofs.

Comments:	32 pages
Subjects:	Dynamical Systems (math.DS); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1503.09105 [math.DS]
	(or arXiv:1503.09105v9 [math.DS] for this version)
	https://doi.org/10.48550/arXiv.1503.09105

Submission history

From: Prasenjit Karmakar [view email]
[v1] Tue, 31 Mar 2015 16:10:55 UTC (32 KB)
[v2] Thu, 2 Apr 2015 17:18:37 UTC (32 KB)
[v3] Thu, 30 Apr 2015 04:11:39 UTC (32 KB)
[v4] Tue, 4 Aug 2015 12:49:32 UTC (33 KB)
[v5] Wed, 5 Aug 2015 14:02:19 UTC (33 KB)
[v6] Thu, 6 Aug 2015 12:53:51 UTC (33 KB)
[v7] Fri, 1 Jan 2016 12:10:22 UTC (33 KB)
[v8] Mon, 18 Jan 2016 15:29:21 UTC (33 KB)
[v9] Mon, 21 Mar 2016 19:25:28 UTC (30 KB)
[v10] Sat, 26 Mar 2016 04:53:48 UTC (30 KB)
[v11] Sun, 17 Apr 2016 13:11:17 UTC (33 KB)
[v12] Thu, 16 Feb 2017 09:37:38 UTC (33 KB)
[v13] Wed, 22 Feb 2017 17:06:39 UTC (33 KB)
[v14] Sat, 25 Feb 2017 18:46:13 UTC (35 KB)

Mathematics > Dynamical Systems

Title:Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Dynamical Systems

Title:Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators