RealHarm: A Collection of Real-World Language Model Application Failures

Jeune, Pierre Le; Liu, Jiaen; Rossi, Luca; Dora, Matteo

Computer Science > Computers and Society

arXiv:2504.10277 (cs)

[Submitted on 14 Apr 2025]

Title:RealHarm: A Collection of Real-World Language Model Application Failures

Authors:Pierre Le Jeune, Jiaen Liu, Luca Rossi, Matteo Dora

View PDF

Abstract:Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.

Subjects:	Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Cite as:	arXiv:2504.10277 [cs.CY]
	(or arXiv:2504.10277v1 [cs.CY] for this version)
	https://doi.org/10.48550/arXiv.2504.10277

Submission history

From: Matteo Dora [view email]
[v1] Mon, 14 Apr 2025 14:44:41 UTC (726 KB)

Computer Science > Computers and Society

Title:RealHarm: A Collection of Real-World Language Model Application Failures

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computers and Society

Title:RealHarm: A Collection of Real-World Language Model Application Failures

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators