Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Huang, Allison; Pi, Yulu Niki; Mougan, Carlos

Computer Science > Computation and Language

arXiv:2411.11731 (cs)

[Submitted on 18 Nov 2024]

Title:Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Authors:Allison Huang, Yulu Niki Pi, Carlos Mougan

View PDF HTML (experimental)

Abstract:We explore how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, we examine the susceptibility to moral ambiguity by evaluating a Base Agent LLM on morally ambiguous scenarios and observing how a Persuader Agent attempts to modify the Base Agent's initial decisions. The second experiment evaluates the susceptibility of LLMs to align with predefined ethical frameworks by prompting them to adopt specific value alignments rooted in established philosophical theories. The results demonstrate that LLMs can indeed be persuaded in morally charged scenarios, with the success of persuasion depending on factors such as the model used, the complexity of the scenario, and the conversation length. Notably, LLMs of distinct sizes but from the same company produced markedly different outcomes, highlighting the variability in their susceptibility to ethical persuasion.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.11731 [cs.CL]
	(or arXiv:2411.11731v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.11731

Submission history

From: Allison Huang [view email]
[v1] Mon, 18 Nov 2024 16:59:59 UTC (325 KB)

Computer Science > Computation and Language

Title:Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators