UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Qu, Yiting; Shen, Xinyue; Wu, Yixin; Backes, Michael; Zannettou, Savvas; Zhang, Yang

Computer Science > Cryptography and Security

arXiv:2405.03486 (cs)

[Submitted on 6 May 2024 (v1), last revised 5 Sep 2024 (this version, v2)]

Title:UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Authors:Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang

View PDF HTML (experimental)

Abstract:With the advent of text-to-image models and concerns about their misuse, developers are increasingly relying on image safety classifiers to moderate their generated unsafe images. Yet, the performance of current image safety classifiers remains unknown for both real-world and AI-generated images. In this work, we propose UnsafeBench, a benchmarking framework that evaluates the effectiveness and robustness of image safety classifiers, with a particular focus on the impact of AI-generated images on their performance. First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11 unsafe categories of images (sexual, violent, hateful, etc.). Then, we evaluate the effectiveness and robustness of five popular image safety classifiers, as well as three classifiers that are powered by general-purpose visual language models. Our assessment indicates that existing image safety classifiers are not comprehensive and effective enough to mitigate the multifaceted problem of unsafe images. Also, there exists a distribution shift between real-world and AI-generated images in image qualities, styles, and layouts, leading to degraded effectiveness and robustness. Motivated by these findings, we build a comprehensive image moderation tool called PerspectiveVision, which addresses the main drawbacks of existing classifiers with improved effectiveness and robustness, especially on AI-generated images. UnsafeBench and PerspectiveVision can aid the research community in better understanding the landscape of image safety classification in the era of generative AI.

Subjects:	Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
Cite as:	arXiv:2405.03486 [cs.CR]
	(or arXiv:2405.03486v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2405.03486

Submission history

From: Yiting Qu [view email]
[v1] Mon, 6 May 2024 13:57:03 UTC (14,489 KB)
[v2] Thu, 5 Sep 2024 20:23:19 UTC (14,788 KB)

Computer Science > Cryptography and Security

Title:UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators