Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Banerjee, Somnath; Layek, Sayan; Chatterjee, Pratyush; Mukherjee, Animesh; Hazra, Rima

Computer Science > Computation and Language

arXiv:2502.11244 (cs)

[Submitted on 16 Feb 2025]

Title:Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Authors:Somnath Banerjee, Sayan Layek, Pratyush Chatterjee, Animesh Mukherjee, Rima Hazra

View PDF HTML (experimental)

Abstract:Ensuring consistent safety across multiple languages remains a significant challenge for large language models (LLMs). We introduce Soteria, a lightweight yet powerful strategy that locates and minimally adjusts the "functional heads" most responsible for harmful content generation in each language. By altering only a fraction of parameters, Soteria drastically reduces policy violations without sacrificing overall model performance, even in low-resource settings. To rigorously evaluate our approach, we also present XThreatBench, a specialized multilingual dataset capturing fine-grained harmful behaviors drawn from real policy guidelines. Experiments with leading open-source LLMs (e.g., Llama, Qwen, Mistral) show that Soteria consistently improves safety metrics across high-, mid-, and low-resource languages. These findings highlight a promising path toward scalable, linguistically attuned, and ethically aligned LLMs worldwide.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.11244 [cs.CL]
	(or arXiv:2502.11244v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.11244

Submission history

From: Somnath Banerjee [view email]
[v1] Sun, 16 Feb 2025 19:44:01 UTC (1,568 KB)

Computer Science > Computation and Language

Title:Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Soteria: Language-Specific Functional Parameter Steering for Multilingual Safety Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators