Repeated Inverse Reinforcement Learning for AI Safety

Amin, Kareem; Jiang, Nan; Singh, Satinder

Computer Science > Artificial Intelligence

arXiv:1705.05427v1 (cs)

[Submitted on 15 May 2017 (this version), latest version 4 Nov 2017 (v3)]

Title:Repeated Inverse Reinforcement Learning for AI Safety

Authors:Kareem Amin, Nan Jiang, Satinder Singh

View PDF

Abstract:How detailed should we make the goals we prescribe to AI agents acting on our behalf in complex environments? Detailed and low-level specification of goals can be tedious and expensive to create, and abstract and high-level goals could lead to negative surprises as the agent may find behaviors that we would not want it to do, i.e., lead to unsafe AI. One approach to addressing this dilemma is for the agent to infer human goals by observing human behavior. This is the Inverse Reinforcement Learning (IRL) problem. However, IRL is generally ill-posed for there are typically many reward functions for which the observed behavior is optimal. While the use of heuristics to select from among the set of feasible reward functions has led to successful applications of IRL to learning from demonstration, such heuristics do not address AI safety. In this paper we introduce a novel repeated IRL problem that captures an aspect of AI safety as follows. The agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human. Each time the human is surprised the agent is provided a demonstration of the desired behavior by the human. We formalize this problem, including how the sequence of tasks is chosen, in a few different ways and provide some foundational results.

Comments:	The first two authors contributed equally to this work
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:1705.05427 [cs.AI]
	(or arXiv:1705.05427v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1705.05427

Submission history

From: Nan Jiang [view email]
[v1] Mon, 15 May 2017 20:06:35 UTC (59 KB)
[v2] Thu, 18 May 2017 19:32:27 UTC (59 KB)
[v3] Sat, 4 Nov 2017 00:38:19 UTC (30 KB)

Computer Science > Artificial Intelligence

Title:Repeated Inverse Reinforcement Learning for AI Safety

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Repeated Inverse Reinforcement Learning for AI Safety

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators