Online Learning in Kernelized Markov Decision Processes

Chowdhury, Sayak Ray; Gopalan, Aditya

Computer Science > Machine Learning

arXiv:1805.08052 (cs)

[Submitted on 21 May 2018 (v1), last revised 3 Jan 2019 (this version, v2)]

Title:Online Learning in Kernelized Markov Decision Processes

Authors:Sayak Ray Chowdhury, Aditya Gopalan

View PDF

Abstract:We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ nonparametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels (frequentist setting), we show that the algorithms enjoy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multidimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.

Comments:	22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1805.08052 [cs.LG]
	(or arXiv:1805.08052v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.08052

Submission history

From: Sayak Ray Chowdhury [view email]
[v1] Mon, 21 May 2018 13:44:10 UTC (64 KB)
[v2] Thu, 3 Jan 2019 03:30:24 UTC (84 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-05

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sayak Ray Chowdhury
Aditya Gopalan

export BibTeX citation

Computer Science > Machine Learning

Title:Online Learning in Kernelized Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Online Learning in Kernelized Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators