Inverse Reinforcement Learning via Matching of Optimality Profiles

Haug, Luis; Ovinnikov, Ivan; Bykovets, Eugene

Computer Science > Machine Learning

arXiv:2011.09264 (cs)

[Submitted on 18 Nov 2020 (v1), last revised 19 Nov 2020 (this version, v2)]

Title:Inverse Reinforcement Learning via Matching of Optimality Profiles

Authors:Luis Haug, Ivan Ovinnikov, Eugene Bykovets

View PDF

Abstract:The goal of inverse reinforcement learning (IRL) is to infer a reward function that explains the behavior of an agent performing a task. The assumption that most approaches make is that the demonstrated behavior is near-optimal. In many real-world scenarios, however, examples of truly optimal behavior are scarce, and it is desirable to effectively leverage sets of demonstrations of suboptimal or heterogeneous performance, which are easier to obtain. We propose an algorithm that learns a reward function from such demonstrations together with a weak supervision signal in the form of a distribution over rewards collected during the demonstrations (or, more generally, a distribution over cumulative discounted future rewards). We view such distributions, which we also refer to as optimality profiles, as summaries of the degree of optimality of the demonstrations that may, for example, reflect the opinion of a human expert. Given an optimality profile and a small amount of additional supervision, our algorithm fits a reward function, modeled as a neural network, by essentially minimizing the Wasserstein distance between the corresponding induced distribution and the optimality profile. We show that our method is capable of learning reward functions such that policies trained to optimize them outperform the demonstrations used for fitting the reward functions.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2011.09264 [cs.LG]
	(or arXiv:2011.09264v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.09264

Submission history

From: Luis Haug [view email]
[v1] Wed, 18 Nov 2020 13:23:43 UTC (2,076 KB)
[v2] Thu, 19 Nov 2020 08:55:03 UTC (2,076 KB)

Computer Science > Machine Learning

Title:Inverse Reinforcement Learning via Matching of Optimality Profiles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Inverse Reinforcement Learning via Matching of Optimality Profiles

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators