Multiple Instance Learning for Malware Classification

Stiborek, Jan; Pevný, Tomáš; Rehák, Martin

Computer Science > Cryptography and Security

arXiv:1705.02268 (cs)

[Submitted on 5 May 2017]

Title:Multiple Instance Learning for Malware Classification

Authors:Jan Stiborek, Tomáš Pevný, Martin Rehák

View PDF

Abstract:This work addresses classification of unknown binaries executed in sandbox by modeling their interaction with system resources (files, mutexes, registry keys and communication with servers over the network) and error messages provided by the operating system, using vocabulary-based method from the multiple instance learning paradigm. It introduces similarities suitable for individual resource types that combined with an approximative clustering method efficiently group the system resources and define features directly from data. This approach effectively removes randomization often employed by malware authors and projects samples into low-dimensional feature space suitable for common classifiers. An extensive comparison to the state of the art on a large corpus of binaries demonstrates that the proposed solution achieves superior results using only a fraction of training samples. Moreover, it makes use of a source of information different than most of the prior art, which increases the diversity of tools detecting the malware, hence making detection evasion more difficult.

Subjects:	Cryptography and Security (cs.CR)
Cite as:	arXiv:1705.02268 [cs.CR]
	(or arXiv:1705.02268v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.1705.02268

Submission history

From: Jan Stiborek [view email]
[v1] Fri, 5 May 2017 15:35:44 UTC (298 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CR

< prev | next >

new | recent | 2017-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jan Stiborek
Tomás Pevný
Martin Rehák

export BibTeX citation

Computer Science > Cryptography and Security

Title:Multiple Instance Learning for Malware Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Multiple Instance Learning for Malware Classification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators