Computer Science > Cryptography and Security
[Submitted on 24 Aug 2022 (v1), revised 21 Sep 2022 (this version, v2), latest version 27 Apr 2023 (v3)]
Title:Transformer-Boosted Anomaly Detection with Fuzzy Hashes
View PDFAbstract:Fuzzy hashes are an important tool in digital forensics and are used in approximate matching to determine the similarity between digital artifacts. They translate the byte code of files into computable strings, which makes them particularly interesting for intelligent machine processing. In this work, we propose deep learning approximate matching (DLAM), which achieves much higher accuracy in detecting anomalies in fuzzy hashes than conventional approaches. In addition to the well-known application for clustering malware, we show that fuzzy hashes and deep learning are indeed well-suited to classify files according to the presence of certain content, e.g., malware. DLAM relies on transformer-based models from the field of natural language processing and outperforms existing methods. Traditional fuzzy hashes like TLSH and ssdeep have a limited size and fail to detect file anomalies if they are relatively small compared to the overall file size. DLAM, however, enables the detection of such file correlations in the computed fuzzy hashes of TLSH and ssdeep, even for anomaly sizes of less than 15%. It achieves comparable results to state-of-the-art fuzzy hashing algorithms while relying on more efficient hash computations and can, therefore, be used at a much larger scale.
Submission history
From: Lukas Struppek [view email][v1] Wed, 24 Aug 2022 08:26:49 UTC (605 KB)
[v2] Wed, 21 Sep 2022 06:03:08 UTC (1,939 KB)
[v3] Thu, 27 Apr 2023 15:43:32 UTC (632 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.