SLACC: Simion-based Language Agnostic Code Clones

Mathew, George; Parnin, Chris; Stolee, Kathryn T

doi:10.1145/3377811.3380407

Abstract:Successful cross-language clone detection could enable researchers and developers to create robust language migration tools, facilitate learning additional programming languages once one is mastered, and promote reuse of code snippets over a broader codebase. However, identifying cross-language clones presents special challenges to the clone detection problem. A lack of common underlying representation between arbitrary languages means detecting clones requires one of the following solutions: 1) a static analysis framework replicated across each targeted language with annotations matching language features across all languages, or 2) a dynamic analysis framework that detects clones based on runtime behavior.
In this work, we demonstrate the feasibility of the latter solution, a dynamic analysis approach called SLACC for cross-language clone detection. Like prior clone detection techniques, we use input/output behavior to match clones, though we overcome limitations of prior work by amplifying the number of inputs and covering more data types; and as a result, achieve better clusters than prior attempts. Since clusters are generated based on input/output behavior, SLACC supports cross-language clone detection. As an added challenge, we target a static typed language, Java, and a dynamic typed language, Python. Compared to HitoshiIO, a recent clone detection tool for Java, SLACC retrieves 6 times as many clusters and has higher precision (86.7% vs. 30.7%).
This is the first work to perform clone detection for dynamic typed languages (precision = 87.3%) and the first to perform clone detection across languages that lack a common underlying representation (precision = 94.1%). It provides a first step towards the larger goal of scalable language migration tools.

Comments:	11 Pages, 3 Figures, Accepted at ICSE 2020 technical track
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2002.03039 [cs.SE]
	(or arXiv:2002.03039v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2002.03039
Related DOI:	https://doi.org/10.1145/3377811.3380407

Computer Science > Software Engineering

Title:SLACC: Simion-based Language Agnostic Code Clones

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators