Computer Science > Cryptography and Security
[Submitted on 23 May 2019 (this version), latest version 19 Sep 2019 (v2)]
Title:Detecting Malicious PowerShell Scripts Using Contextual Embeddings
View PDFAbstract:PowerShell is a command line shell, that is widely used in organizations for configuration management and task automation. Unfortunately, PowerShell is also increasingly used by cybercriminals for launching cyber attacks against organizations, mainly because it is pre-installed on Windows machines and it exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell scripts both urgent and challenging.
We address this important problem by presenting several novel deep learning based detectors of malicious PowerShell scripts. Our best model obtains a true positive rate of nearly 90% while maintaining a low false positive rate of less than 0.1%, indicating that it can be of practical value.
Our models employ pre-trained contextual embeddings of words from the PowerShell "language". A contextual word embedding is able to project semantically similar words to proximate vectors in the embedding space. A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell scripts. Our work shows that this problem can be largely mitigated by learning a pre-trained contextual embedding based on unlabeled data.
We trained our models' embedding layer using a scripts dataset that was enriched by a large corpus of unlabeled PowerShell scripts collected from public repositories. As established by our performance analysis, the use of unlabeled data for the embedding significantly improved the performance of our detectors. We estimate that the usage of pre-trained contextual embeddings based on unlabeled data for improved classification accuracy will find additional applications in the cybersecurity domain.
Submission history
From: Amir Rubin [view email][v1] Thu, 23 May 2019 08:53:52 UTC (819 KB)
[v2] Thu, 19 Sep 2019 05:37:35 UTC (1,083 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.