Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Mahbub, Parvez; Oishie, Naz Zarreen; Haque, S M Rafizul

doi:10.1109/ICCIT48885.2019.9038412

Computer Science > Software Engineering

arXiv:2212.05610 (cs)

[Submitted on 11 Dec 2022]

Title:Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Authors:Parvez Mahbub, Naz Zarreen Oishie, S M Rafizul Haque

View PDF

Abstract:Source code segment authorship identification is the task of identifying the author of a source code segment through supervised learning. It has vast importance in plagiarism detection, digital forensics, and several other law enforcement issues. However, when a source code segment is written by multiple authors, typical author identification methods no longer work. Here, an author identification technique, capable of predicting the authorship of source code segments, even in the case of multiple authors, has been proposed which uses a stacking ensemble classifier. This proposed technique is built upon several deep neural networks, random forests and support vector machine classifiers. It has been shown that for identifying the author group, a single classification technique is no longer sufficient and using a deep neural network-based stacking ensemble method can enhance the accuracy significantly. The performance of the proposed technique has been compared with some existing methods which only deal with the source code segments written precisely by a single author. Despite the harder task of authorship identification for source code segments written by multiple authors, our proposed technique has achieved promising results evidenced by the identification accuracy, compared to the related works which only deal with code segments written by a single author.

Comments:	2019 22nd International Conference on Computer and Information Technology (ICCIT)
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2212.05610 [cs.SE]
	(or arXiv:2212.05610v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2212.05610
Related DOI:	https://doi.org/10.1109/ICCIT48885.2019.9038412

Submission history

From: Parvez Mahbub [view email]
[v1] Sun, 11 Dec 2022 21:49:08 UTC (784 KB)

Computer Science > Software Engineering

Title:Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:Authorship Identification of Source Code Segments Written by Multiple Authors Using Stacking Ensemble Method

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators