The Program Testing Ability of Large Language Models for Code

Xiong, Weimin; Guo, Yiwen; Chen, Hao

Computer Science > Computation and Language

arXiv:2310.05727 (cs)

[Submitted on 9 Oct 2023]

Title:The Program Testing Ability of Large Language Models for Code

Authors:Weimin Xiong, Yiwen Guo, Hao Chen

View PDF

Abstract:Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task has been intensively tested and verified on benchmark datasets including HumanEval and MBPP. Yet, evaluation of these LLMs from more perspectives (than just program synthesis) is also anticipated, considering their broad scope of applications in software engineering. In this paper, we explore the ability of LLMs for testing programs/code. By performing thorough analyses of recent LLMs for code in program testing, we show a series of intriguing properties of these models and demonstrate how program testing ability of LLMs can be improved. Following recent work which utilizes generated test cases to enhance program synthesis, we further leverage our findings in improving the quality of the synthesized programs and show +11.77% and +4.22% higher code pass rates on HumanEval+ comparing with the GPT-3.5-turbo baseline and the recent state-of-the-art, respectively.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Cite as:	arXiv:2310.05727 [cs.CL]
	(or arXiv:2310.05727v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.05727

Submission history

From: Weimin Xiong [view email]
[v1] Mon, 9 Oct 2023 13:55:45 UTC (1,064 KB)

Computer Science > Computation and Language

Title:The Program Testing Ability of Large Language Models for Code

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:The Program Testing Ability of Large Language Models for Code

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators