REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Hai, Nam Le; Nguyen, Dung Manh; Bui, Nghi D. Q.

Computer Science > Software Engineering

arXiv:2406.11927 (cs)

[Submitted on 17 Jun 2024 (v1), last revised 19 Jun 2024 (this version, v2)]

Title:REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Authors:Nam Le Hai, Dung Manh Nguyen, Nghi D. Q. Bui

View PDF HTML (experimental)

Abstract:The ability of CodeLLMs to generate executable and functionally correct code at the repository-level scale remains largely unexplored. We introduce RepoExec, a novel benchmark for evaluating code generation at the repository-level scale. RepoExec focuses on three main aspects: executability, functional correctness through automated test case generation with high coverage rate, and carefully crafted cross-file contexts to accurately generate code. Our work explores a controlled scenario where developers specify necessary code dependencies, challenging the model to integrate these accurately. Experiments show that while pretrained LLMs outperform instruction-tuned models in correctness, the latter excel in utilizing provided dependencies and demonstrating debugging capabilities. We also introduce a new instruction-tuned dataset that focuses on code dependencies and demonstrate that CodeLLMs fine-tuned on our dataset have a better capability to leverage these dependencies effectively. RepoExec aims to provide a comprehensive evaluation of code functionality and alignment with developer intent, paving the way for more reliable and applicable CodeLLMs in real-world scenarios. The dataset and source code can be found at~\url{this https URL}.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.11927 [cs.SE]
	(or arXiv:2406.11927v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2406.11927

Submission history

From: Nghi D. Q. Bui [view email]
[v1] Mon, 17 Jun 2024 10:45:22 UTC (1,036 KB)
[v2] Wed, 19 Jun 2024 05:27:32 UTC (1,003 KB)

Computer Science > Software Engineering

Title:REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators