Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Liu, Mingwei; Li, Juntao; Wang, Ying; Du, Xueying; Ou, Zuoyu; Chen, Qiuyuan; An, Bingxu; Wei, Zhao; Xu, Yong; Zou, Fangming; Peng, Xin; Lou, Yiling

Computer Science > Software Engineering

arXiv:2504.12608 (cs)

[Submitted on 17 Apr 2025]

Title:Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Authors:Mingwei Liu, Juntao Li, Ying Wang, Xueying Du, Zuoyu Ou, Qiuyuan Chen, Bingxu An, Zhao Wei, Yong Xu, Fangming Zou, Xin Peng, Yiling Lou

View PDF HTML (experimental)

Abstract:Despite recent advances in Large Language Models (LLMs) for code generation, the quality of LLM-generated code still faces significant challenges. One significant issue is code repetition, which refers to the model's tendency to generate structurally redundant code, resulting in inefficiencies and reduced readability. To address this, we conduct the first empirical study to investigate the prevalence and nature of repetition across 19 state-of-the-art code LLMs using three widely-used benchmarks. Our study includes both quantitative and qualitative analyses, revealing that repetition is pervasive and manifests at various granularities and extents, including character, statement, and block levels. We further summarize a taxonomy of 20 repetition patterns. Building on our findings, we propose DeRep, a rule-based technique designed to detect and mitigate repetition in generated code. We evaluate DeRep using both open-source benchmarks and in an industrial setting. Our results demonstrate that DeRep significantly outperforms baselines in reducing repetition (with an average improvements of 91.3%, 93.5%, and 79.9% in rep-3, rep-line, and sim-line metrics) and enhancing code quality (with a Pass@1 increase of 208.3% over greedy search). Furthermore, integrating DeRep improves the performance of existing repetition mitigation methods, with Pass@1 improvements ranging from 53.7% to 215.7%.

Subjects:	Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2504.12608 [cs.SE]
	(or arXiv:2504.12608v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2504.12608

Submission history

From: Xueying Du [view email]
[v1] Thu, 17 Apr 2025 03:13:39 UTC (3,659 KB)

Computer Science > Software Engineering

Title:Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Code Copycat Conundrum: Demystifying Repetition in LLM-based Code Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators