On Ensuring that Intelligent Machines Are Well-Behaved

Thomas, Philip S.; da Silva, Bruno Castro; Barto, Andrew G.; Brunskill, Emma

Computer Science > Artificial Intelligence

arXiv:1708.05448 (cs)

[Submitted on 17 Aug 2017]

Title:On Ensuring that Intelligent Machines Are Well-Behaved

Authors:Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

View PDF

Abstract:Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are well-behaved---that they do not, for example, cause harm to humans or act in a racist or sexist way---is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we address here. We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors. To show the viability of this new framework, we use it to create new machine learning algorithms that preclude the sexist and harmful behaviors exhibited by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:1708.05448 [cs.AI]
	(or arXiv:1708.05448v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1708.05448

Submission history

From: Philip Thomas [view email]
[v1] Thu, 17 Aug 2017 21:53:47 UTC (5,393 KB)

Computer Science > Artificial Intelligence

Title:On Ensuring that Intelligent Machines Are Well-Behaved

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:On Ensuring that Intelligent Machines Are Well-Behaved

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators