A Static Evaluation of Code Completion by Large Language Models

Ding, Hantian; Kumar, Varun; Tian, Yuchen; Wang, Zijian; Kwiatkowski, Rob; Li, Xiaopeng; Ramanathan, Murali Krishna; Ray, Baishakhi; Bhatia, Parminder; Sengupta, Sudipta; Roth, Dan; Xiang, Bing

Computer Science > Computation and Language

arXiv:2306.03203 (cs)

[Submitted on 5 Jun 2023]

Title:A Static Evaluation of Code Completion by Large Language Models

Authors:Hantian Ding, Varun Kumar, Yuchen Tian, Zijian Wang, Rob Kwiatkowski, Xiaopeng Li, Murali Krishna Ramanathan, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

View PDF

Abstract:Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the contrary, static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. In this work, we propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees. Compared with execution-based evaluation, our method is not only more efficient, but also applicable to code in the wild. For experiments, we collect code context from open source repos to generate one million function bodies using public models. Our static analysis reveals that Undefined Name and Unused Variable are the most common errors among others made by language models. Through extensive studies, we also show the impact of sampling temperature, model size, and context on static errors in code completions.

Comments:	Accepted by ACL 2023 industry track
Subjects:	Computation and Language (cs.CL); Software Engineering (cs.SE)
Cite as:	arXiv:2306.03203 [cs.CL]
	(or arXiv:2306.03203v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.03203

Submission history

From: Hantian Ding [view email]
[v1] Mon, 5 Jun 2023 19:23:34 UTC (7,616 KB)

Computer Science > Computation and Language

Title:A Static Evaluation of Code Completion by Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Static Evaluation of Code Completion by Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators