Large Language Models of Code Fail at Completing Code with Potential Bugs

Dinh, Tuan; Zhao, Jinman; Tan, Samson; Negrinho, Renato; Lausen, Leonard; Zha, Sheng; Karypis, George

Computer Science > Machine Learning

arXiv:2306.03438 (cs)

[Submitted on 6 Jun 2023 (v1), last revised 1 Dec 2023 (this version, v2)]

Title:Large Language Models of Code Fail at Completing Code with Potential Bugs

Authors:Tuan Dinh, Jinman Zhao, Samson Tan, Renato Negrinho, Leonard Lausen, Sheng Zha, George Karypis

View PDF

Abstract:Large language models of code (Code-LLMs) have recently brought tremendous advances to code completion, a fundamental feature of programming assistance and code intelligence. However, most existing works ignore the possible presence of bugs in the code context for generation, which are inevitable in software development. Therefore, we introduce and study the buggy-code completion problem, inspired by the realistic scenario of real-time code suggestion where the code context contains potential bugs -- anti-patterns that can become bugs in the completed program. To systematically study the task, we introduce two datasets: one with synthetic bugs derived from semantics-altering operator changes (buggy-HumanEval) and one with realistic bugs derived from user submissions to coding problems (buggy-FixEval). We find that the presence of potential bugs significantly degrades the generation performance of the high-performing Code-LLMs. For instance, the passing rates of CODEGEN-2B-MONO on test cases of buggy-HumanEval drop more than 50% given a single potential bug in the context. Finally, we investigate several post-hoc methods for mitigating the adverse effect of potential bugs and find that there remains a significant gap in post-mitigation performance.

Comments:	27 pages, accepted to NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Software Engineering (cs.SE)
Cite as:	arXiv:2306.03438 [cs.LG]
	(or arXiv:2306.03438v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.03438

Submission history

From: Jinman Zhao [view email]
[v1] Tue, 6 Jun 2023 06:35:27 UTC (1,330 KB)
[v2] Fri, 1 Dec 2023 01:27:37 UTC (1,335 KB)

Computer Science > Machine Learning

Title:Large Language Models of Code Fail at Completing Code with Potential Bugs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Large Language Models of Code Fail at Completing Code with Potential Bugs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators