RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Zhang, Yichi; Zeng, Zihao; Li, Dongbai; Huang, Yao; Deng, Zhijie; Dong, Yinpeng

Computer Science > Artificial Intelligence

arXiv:2504.10081 (cs)

[Submitted on 14 Apr 2025]

Title:RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Authors:Yichi Zhang, Zihao Zeng, Dongbai Li, Yao Huang, Zhijie Deng, Yinpeng Dong

View PDF

Abstract:Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have been rapidly progressing and achieving breakthrough performance on complex reasoning tasks such as mathematics and coding. However, the open-source R1 models have raised safety concerns in wide applications, such as the tendency to comply with malicious queries, which greatly impacts the utility of these powerful models in their applications. In this paper, we introduce RealSafe-R1 as safety-aligned versions of DeepSeek-R1 distilled models. To train these models, we construct a dataset of 15k safety-aware reasoning trajectories generated by DeepSeek-R1, under explicit instructions for expected refusal behavior. Both quantitative experiments and qualitative case studies demonstrate the models' improvements, which are shown in their safety guardrails against both harmful queries and jailbreak attacks. Importantly, unlike prior safety alignment efforts that often compromise reasoning performance, our method preserves the models' reasoning capabilities by maintaining the training data within the original distribution of generation. Model weights of RealSafe-R1 are open-source at this https URL.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2504.10081 [cs.AI]
	(or arXiv:2504.10081v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2504.10081

Submission history

From: Yichi Zhang [view email]
[v1] Mon, 14 Apr 2025 10:26:37 UTC (373 KB)

Computer Science > Artificial Intelligence

Title:RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators