Strictly Batch Imitation Learning by Energy-based Distribution Matching

Jarrett, Daniel; Bica, Ioana; van der Schaar, Mihaela

Statistics > Machine Learning

arXiv:2006.14154 (stat)

[Submitted on 25 Jun 2020 (v1), last revised 14 Jan 2021 (this version, v2)]

Title:Strictly Batch Imitation Learning by Energy-based Distribution Matching

Authors:Daniel Jarrett, Ioana Bica, Mihaela van der Schaar

View PDF

Abstract:Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly learn from rollout dynamics (i.e. leveraging state marginals), and -- crucially -- operate in an entirely offline fashion. To address this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a policy with the (generative) energy function for state distributions, EDM yields a simple but effective solution that equivalently minimizes a divergence between the occupancy measure for the demonstrator and a model thereof for the imitator. Through experiments with application to control and healthcare settings, we illustrate consistent performance gains over existing algorithms for strictly batch imitation learning.

Comments:	In Proc. 34th International Conference on Neural Information Processing Systems (NeurIPS 2020)
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2006.14154 [stat.ML]
	(or arXiv:2006.14154v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2006.14154

Submission history

From: Daniel Jarrett [view email]
[v1] Thu, 25 Jun 2020 03:27:59 UTC (1,121 KB)
[v2] Thu, 14 Jan 2021 17:54:32 UTC (1,113 KB)

Statistics > Machine Learning

Title:Strictly Batch Imitation Learning by Energy-based Distribution Matching

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Strictly Batch Imitation Learning by Energy-based Distribution Matching

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators