The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

Tang, Yihong; Chen, Kehai; Bai, Xuefeng; Niu, Zhengyu; Wang, Bo; Liu, Jie; Zhang, Min

Computer Science > Computation and Language

arXiv:2502.20757 (cs)

[Submitted on 28 Feb 2025]

Title:The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

Authors:Yihong Tang, Kehai Chen, Xuefeng Bai, Zhengyu Niu, Bo Wang, Jie Liu, Min Zhang

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have made remarkable advances in role-playing dialogue agents, demonstrating their utility in character simulations. However, it remains challenging for these agents to balance character portrayal utility with content safety because this essential character simulation often comes with the risk of generating unsafe content. To address this issue, we first conduct a systematic exploration of the safety-utility trade-off across multiple LLMs. Our analysis reveals that risk scenarios created by villain characters and user queries (referred to as risk coupling) contribute to this trade-off. Building on this, we propose a novel Adaptive Dynamic Multi-Preference (ADMP) method, which dynamically adjusts safety-utility preferences based on the degree of risk coupling and guides the model to generate responses biased toward utility or safety. We further introduce Coupling Margin Sampling (CMS) into coupling detection to enhance the model's ability to handle high-risk scenarios. Experimental results demonstrate that our approach improves safety metrics while maintaining utility.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.20757 [cs.CL]
	(or arXiv:2502.20757v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.20757

Submission history

From: Yihong Tang [view email]
[v1] Fri, 28 Feb 2025 06:18:50 UTC (5,739 KB)

Computer Science > Computation and Language

Title:The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators