Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

In, Yeonjun; Kim, Wonjoong; Yoon, Kanghoon; Kim, Sungchul; Tanjim, Mehrab; Kim, Kibum; Park, Chanyoung

Computer Science > Computation and Language

arXiv:2502.15086 (cs)

[Submitted on 20 Feb 2025]

Title:Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Authors:Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Kibum Kim, Chanyoung Park

View PDF HTML (experimental)

Abstract:As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at this https URL.

Comments:	Under review
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.15086 [cs.CL]
	(or arXiv:2502.15086v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.15086

Submission history

From: Yeonjun In [view email]
[v1] Thu, 20 Feb 2025 22:58:44 UTC (1,274 KB)

Computer Science > Computation and Language

Title:Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators