Repository Structure-Aware Training Makes SLMs Better Issue Resolver

Ma, Zexiong; An, Shengnan; Lin, Zeqi; Zou, Yanzhen; Xie, Bing

Computer Science > Software Engineering

arXiv:2412.19031 (cs)

[Submitted on 26 Dec 2024]

Title:Repository Structure-Aware Training Makes SLMs Better Issue Resolver

Authors:Zexiong Ma, Shengnan An, Zeqi Lin, Yanzhen Zou, Bing Xie

View PDF HTML (experimental)

Abstract:Language models have been applied to various software development tasks, but the performance varies according to the scale of the models. Large Language Models (LLMs) outperform Small Language Models (SLMs) in complex tasks like repository-level issue resolving, but raise concerns about privacy and cost. In contrast, SLMs are more accessible but under-perform in complex tasks. In this paper, we introduce ReSAT (Repository Structure-Aware Training), construct training data based on a large number of issues and corresponding pull requests from open-source communities to enhance the model's understanding of repository structure and issue resolving ability. We construct two types of training data: (1) localization training data, a multi-level progressive localization data to improve code understanding and localization capability; (2) code edit training data, which improves context-based code editing capability. The evaluation results on SWE-Bench-verified and RepoQA demonstrate that ReSAT effectively enhances SLMs' issue-resolving and repository-level long-context understanding capabilities.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2412.19031 [cs.SE]
	(or arXiv:2412.19031v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2412.19031

Submission history

From: Zexiong Ma [view email]
[v1] Thu, 26 Dec 2024 03:01:32 UTC (680 KB)

Computer Science > Software Engineering

Title:Repository Structure-Aware Training Makes SLMs Better Issue Resolver

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Repository Structure-Aware Training Makes SLMs Better Issue Resolver

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators