In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

Artuso, Fiorella; Di Luna, Giuseppe Antonio; Massarelli, Luca; Querzoni, Leonardo

Computer Science > Machine Learning

arXiv:1912.07946 (cs)

[Submitted on 17 Dec 2019 (v1), last revised 4 Feb 2021 (this version, v3)]

Title:In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

Authors:Fiorella Artuso, Giuseppe Antonio Di Luna, Luca Massarelli, Leonardo Querzoni

View PDF

Abstract:In this paper we investigate the problem of automatically naming pieces of assembly code. Where by naming we mean assigning to an assembly function a string of words that would likely be assigned by a human reverse engineer. We formally and precisely define the framework in which our investigation takes place. That is we define the problem, we provide reasonable justifications for the choices that we made for the design of training and the tests. We performed an analysis on a large real-world corpora constituted by nearly 9 millions of functions taken from more than 22k softwares. In such framework we test baselines coming from the field of Natural Language Processing (e.g., Seq2Seq networks and Transformer). Interestingly, our evaluation shows promising results beating the state-of-the-art and reaching good performance. We investigate the applicability of tine-tuning (i.e., taking a model already trained on a large generic corpora and retraining it for a specific task). Such technique is popular and well-known in the NLP field. Our results confirm that fine-tuning is effective even when neural networks are applied to binaries. We show that a model, pre-trained on the aforementioned corpora, when fine-tuned has higher performances on specific domains (such as predicting names in system utilites, malware, etc).

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:1912.07946 [cs.LG]
	(or arXiv:1912.07946v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1912.07946

Submission history

From: Giuseppe Antonio Di Luna [view email]
[v1] Tue, 17 Dec 2019 11:59:41 UTC (21 KB)
[v2] Tue, 25 Feb 2020 16:17:59 UTC (56 KB)
[v3] Thu, 4 Feb 2021 09:31:56 UTC (56 KB)

Computer Science > Machine Learning

Title:In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:In Nomine Function: Naming Functions in Stripped Binaries with Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators