TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Wang, Wenhan; Yang, Chenyuan; Wang, Zhijie; Huang, Yuheng; Chu, Zhaoyang; Song, Da; Zhang, Lingming; Chen, An Ran; Ma, Lei

Computer Science > Software Engineering

arXiv:2406.04531 (cs)

[Submitted on 6 Jun 2024]

Title:TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Authors:Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma

View PDF HTML (experimental)

Abstract:Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program under test. Recently, researchers have recognized the potential of large language models (LLMs) in software testing. However, there remains a lack of fair comparisons between different LLMs in terms of test case generation capabilities.
In this paper, we propose TESTEVAL, a novel benchmark for test case generation with LLMs. We collect 210 Python programs from an online programming platform, LeetCode, and design three different tasks: overall coverage, targeted line/branch coverage, and targeted path coverage. We further evaluate sixteen popular LLMs, including both commercial and open-source ones, on TESTEVAL. We find that generating test cases to cover specific program lines/branches/paths is still challenging for current LLMs, indicating a lack of ability to comprehend program logic and execution paths. We have open-sourced our dataset and benchmark pipelines at this https URL to contribute and accelerate future research on LLMs for software testing.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2406.04531 [cs.SE]
	(or arXiv:2406.04531v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2406.04531

Submission history

From: Wenhan Wang [view email]
[v1] Thu, 6 Jun 2024 22:07:50 UTC (994 KB)

Computer Science > Software Engineering

Title:TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:TESTEVAL: Benchmarking Large Language Models for Test Case Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators