Transformer-Boosted Anomaly Detection with Fuzzy Hashes

Uhlig, Frieder; Struppek, Lukas; Hintersdorf, Dominik; Kersting, Kristian

Computer Science > Cryptography and Security

arXiv:2208.11367v2 (cs)

[Submitted on 24 Aug 2022 (v1), revised 21 Sep 2022 (this version, v2), latest version 27 Apr 2023 (v3)]

Title:Transformer-Boosted Anomaly Detection with Fuzzy Hashes

Authors:Frieder Uhlig, Lukas Struppek, Dominik Hintersdorf, Kristian Kersting

View PDF

Abstract:Fuzzy hashes are an important tool in digital forensics and are used in approximate matching to determine the similarity between digital artifacts. They translate the byte code of files into computable strings, which makes them particularly interesting for intelligent machine processing. In this work, we propose deep learning approximate matching (DLAM), which achieves much higher accuracy in detecting anomalies in fuzzy hashes than conventional approaches. In addition to the well-known application for clustering malware, we show that fuzzy hashes and deep learning are indeed well-suited to classify files according to the presence of certain content, e.g., malware. DLAM relies on transformer-based models from the field of natural language processing and outperforms existing methods. Traditional fuzzy hashes like TLSH and ssdeep have a limited size and fail to detect file anomalies if they are relatively small compared to the overall file size. DLAM, however, enables the detection of such file correlations in the computed fuzzy hashes of TLSH and ssdeep, even for anomaly sizes of less than 15%. It achieves comparable results to state-of-the-art fuzzy hashing algorithms while relying on more efficient hash computations and can, therefore, be used at a much larger scale.

Comments:	9 pages, 4 figures, 2 tables
Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2208.11367 [cs.CR]
	(or arXiv:2208.11367v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2208.11367

Submission history

From: Lukas Struppek [view email]
[v1] Wed, 24 Aug 2022 08:26:49 UTC (605 KB)
[v2] Wed, 21 Sep 2022 06:03:08 UTC (1,939 KB)
[v3] Thu, 27 Apr 2023 15:43:32 UTC (632 KB)

Computer Science > Cryptography and Security

Title:Transformer-Boosted Anomaly Detection with Fuzzy Hashes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Transformer-Boosted Anomaly Detection with Fuzzy Hashes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators