Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Agarwal, Divyansh; Fabbri, Alexander R.; Risher, Ben; Laban, Philippe; Joty, Shafiq; Wu, Chien-Sheng

Computer Science > Cryptography and Security

arXiv:2404.16251 (cs)

[Submitted on 24 Apr 2024 (v1), last revised 29 Jul 2024 (this version, v3)]

Title:Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Authors:Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu

View PDF HTML (experimental)

Abstract:Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts. We present different combination of defenses against our threat model, including a cost analysis. Our study highlights key takeaways for building secure LLM applications and provides directions for research in multi-turn LLM interactions

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2404.16251 [cs.CR]
	(or arXiv:2404.16251v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2404.16251

Submission history

From: Divyansh Agarwal [view email]
[v1] Wed, 24 Apr 2024 23:39:58 UTC (261 KB)
[v2] Fri, 26 Apr 2024 07:47:49 UTC (261 KB)
[v3] Mon, 29 Jul 2024 17:16:19 UTC (973 KB)

Computer Science > Cryptography and Security

Title:Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators