CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Zhu, Yuxuan; Kellermann, Antony; Bowman, Dylan; Li, Philip; Gupta, Akul; Danda, Adarsh; Fang, Richard; Jensen, Conner; Ihli, Eric; Benn, Jason; Geronimo, Jet; Dhir, Avi; Rao, Sudhit; Yu, Kaicheng; Stone, Twm; Kang, Daniel

Computer Science > Cryptography and Security

arXiv:2503.17332 (cs)

[Submitted on 21 Mar 2025 (v1), last revised 10 Apr 2025 (this version, v3)]

Title:CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Authors:Yuxuan Zhu, Antony Kellermann, Dylan Bowman, Philip Li, Akul Gupta, Adarsh Danda, Richard Fang, Conner Jensen, Eric Ihli, Jason Benn, Jet Geronimo, Avi Dhir, Sudhit Rao, Kaicheng Yu, Twm Stone, Daniel Kang

View PDF HTML (experimental)

Abstract:Large language model (LLM) agents are increasingly capable of autonomously conducting cyberattacks, posing significant threats to existing applications. This growing risk highlights the urgent need for a real-world benchmark to evaluate the ability of LLM agents to exploit web application vulnerabilities. However, existing benchmarks fall short as they are limited to abstracted Capture the Flag competitions or lack comprehensive coverage. Building a benchmark for real-world vulnerabilities involves both specialized expertise to reproduce exploits and a systematic approach to evaluating unpredictable threats. To address this challenge, we introduce CVE-Bench, a real-world cybersecurity benchmark based on critical-severity Common Vulnerabilities and Exposures. In CVE-Bench, we design a sandbox framework that enables LLM agents to exploit vulnerable web applications in scenarios that mimic real-world conditions, while also providing effective evaluation of their exploits. Our evaluation shows that the state-of-the-art agent framework can resolve up to 13% of vulnerabilities.

Comments:	15 pages, 4 figures, 5 tables
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
ACM classes:	I.2.1; I.2.7
Cite as:	arXiv:2503.17332 [cs.CR]
	(or arXiv:2503.17332v3 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2503.17332

Submission history

From: Yuxuan Zhu [view email]
[v1] Fri, 21 Mar 2025 17:32:32 UTC (408 KB)
[v2] Tue, 1 Apr 2025 18:46:46 UTC (408 KB)
[v3] Thu, 10 Apr 2025 23:50:28 UTC (408 KB)

Computer Science > Cryptography and Security

Title:CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators