KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Park, Junyoung; Jones, Dalton; Morse, Matt J; Goel, Raghavv; Lee, Mingu; Lott, Chris

Computer Science > Artificial Intelligence

arXiv:2504.15364 (cs)

[Submitted on 21 Apr 2025 (v1), last revised 23 Apr 2025 (this version, v2)]

Title:KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Authors:Junyoung Park, Dalton Jones, Matt J Morse, Raghavv Goel, Mingu Lee, Chris Lott

View PDF HTML (experimental)

Abstract:In this work, we demonstrate that distinctive keys during LLM inference tend to have high attention scores. We explore this phenomenon and propose KeyDiff, a training-free KV cache eviction method based on key similarity. This method facilitates the deployment of LLM-based application requiring long input prompts in resource-constrained environments with limited memory and compute budgets. Unlike other KV cache eviction methods, KeyDiff can process arbitrarily long prompts within strict resource constraints and efficiently generate responses. We demonstrate that KeyDiff computes the optimal solution to a KV cache selection problem that maximizes key diversity, providing a theoretical understanding of KeyDiff. Notably,KeyDiff does not rely on attention scores, allowing the use of optimized attention mechanisms like FlashAttention. We demonstrate the effectiveness of KeyDiff across diverse tasks and models, illustrating a performance gap of less than 0.04\% with 8K cache budget ($\sim$ 23\% KV cache reduction) from the non-evicting baseline on the LongBench benchmark for Llama 3.1-8B and Llama 3.2-3B.

Comments:	8 pages, 14 figures
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.15364 [cs.AI]
	(or arXiv:2504.15364v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2504.15364

Submission history

From: Junyoung Park [view email]
[v1] Mon, 21 Apr 2025 18:12:46 UTC (27,250 KB)
[v2] Wed, 23 Apr 2025 18:02:55 UTC (27,250 KB)

Computer Science > Artificial Intelligence

Title:KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators