Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Bajcsy, Andrea; Fisac, Jaime F.

Computer Science > Artificial Intelligence

arXiv:2405.09794 (cs)

[Submitted on 16 May 2024 (v1), last revised 22 Jun 2024 (this version, v2)]

Title:Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Authors:Andrea Bajcsy, Jaime F. Fisac

View PDF HTML (experimental)

Abstract:Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety.

Comments:	Revised version with refined exposition and technical details. 12 pages + references, 5 figures
Subjects:	Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Systems and Control (eess.SY)
ACM classes:	I.2
Cite as:	arXiv:2405.09794 [cs.AI]
	(or arXiv:2405.09794v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2405.09794

Submission history

From: Jaime Fisac [view email]
[v1] Thu, 16 May 2024 03:52:00 UTC (2,002 KB)
[v2] Sat, 22 Jun 2024 20:17:22 UTC (2,053 KB)

Computer Science > Artificial Intelligence

Title:Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators