Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Zugecova, Aneta; Macko, Dominik; Srba, Ivan; Moro, Robert; Kopal, Jakub; Marcincinova, Katarina; Mesarcik, Matus

Computer Science > Computation and Language

arXiv:2412.13666 (cs)

[Submitted on 18 Dec 2024]

Title:Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Authors:Aneta Zugecova, Dominik Macko, Ivan Srba, Robert Moro, Jakub Kopal, Katarina Marcincinova, Matus Mesarcik

View PDF HTML (experimental)

Abstract:The capabilities of recent large language models (LLMs) to generate high-quality content indistinguishable by humans from human-written texts rises many concerns regarding their misuse. Previous research has shown that LLMs can be effectively misused for generating disinformation news articles following predefined narratives. Their capabilities to generate personalized (in various aspects) content have also been evaluated and mostly found usable. However, a combination of personalization and disinformation abilities of LLMs has not been comprehensively studied yet. Such a dangerous combination should trigger integrated safety filters of the LLMs, if there are some. This study fills this gap by evaluation of vulnerabilities of recent open and closed LLMs, and their willingness to generate personalized disinformation news articles in English. We further explore whether the LLMs can reliably meta-evaluate the personalization quality and whether the personalization affects the generated-texts detectability. Our results demonstrate the need for stronger safety-filters and disclaimers, as those are not properly functioning in most of the evaluated LLMs. Additionally, our study revealed that the personalization actually reduces the safety-filter activations; thus effectively functioning as a jailbreak. Such behavior must be urgently addressed by LLM developers and service providers.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as:	arXiv:2412.13666 [cs.CL]
	(or arXiv:2412.13666v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.13666

Submission history

From: Dominik Macko [view email]
[v1] Wed, 18 Dec 2024 09:48:53 UTC (8,998 KB)

Computer Science > Computation and Language

Title:Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators