A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Lazic, Nevena; Yin, Dong; Farajtabar, Mehrdad; Levine, Nir; Gorur, Dilan; Harris, Chris; Schuurmans, Dale

Computer Science > Machine Learning

arXiv:2006.12620 (cs)

[Submitted on 17 Jun 2020]

Title:A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Authors:Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Gorur, Chris Harris, Dale Schuurmans

View PDF

Abstract:This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs). For MDPs that are ergodic and linear (i.e. where rewards and dynamics are linear in some known features), we provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. In a more general setting, when the feature dynamics are approximately linear and for arbitrary rewards, we propose a new approach for estimating stationary distributions with function approximation. We formulate this problem as finding the maximum-entropy distribution subject to matching feature expectations under empirical dynamics. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning. We demonstrate the effectiveness of the proposed OPE approaches in multiple environments.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2006.12620 [cs.LG]
	(or arXiv:2006.12620v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.12620

Submission history

From: Nevena Lazic [view email]
[v1] Wed, 17 Jun 2020 18:13:37 UTC (3,008 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.AI

References & Citations

DBLP - CS Bibliography

listing | bibtex

Nevena Lazic
Dong Yin
Mehrdad Farajtabar
Nir Levine
Dilan Görür

…

export BibTeX citation

Computer Science > Machine Learning

Title:A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A maximum-entropy approach to off-policy evaluation in average-reward MDPs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators