PowerShell is a command-line shell, supporting a scripting language. It is widely used in organizations for configuration management and task automation but is also increasingly used by cybercriminals for launching cyberattacks against organizations, mainly because it is pre-installed on Windows machines and exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell code both urgent and challenging. Microsoft's Antimalware Scan Interface (AMSI) allows defending systems to scan all the code passed to scripting engines such as PowerShell prior to its execution. In this work, we conduct the first study of malicious PowerShell code detection using the information made available by AMSI. We present several novel deep-learning based detectors of malicious PowerShell code that employ pretrained contextual embeddings of words from the PowerShell "language". A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell code. Our work shows that this problem can be mitigated by learning a pretrained contextual embedding based on unlabeled data. We trained and evaluated our models using real-world data, collected using AMSI from a large antimalware vendor. Our performance analysis establishes that the use of unlabeled data for the embedding significantly improved the performance of our detectors. Our best-performing model uses an architecture that enables the processing of textual signals from both the character and token levels and obtains a true positive rate of nearly 90% while maintaining a low false-positive rate of less than 0.1%.
翻译:PowerSherll 是一个命令线外壳, 支持脚本语言。 它在组织中被广泛用于配置管理和任务自动化,但也越来越多地被网络罪犯用来对组织发动网络攻击, 主要是因为它事先安装在Windows 机器上, 暴露了攻击者可能利用的强大功能。 这使得发现恶意 PowerShell 代码的问题既紧迫又具有挑战性。 微软的 AntomalScan接口(AMSI) 使保护系统能够扫描所有通过到脚本引擎的代码, 如PowerShell 执行前的PowerSell 。 在这项工作中, 我们使用AMSI 提供的信息, 首次对恶意PowerShell 代码的检测进行了研究。 我们展示了几个基于恶意PowerSherell“语言” 的基于深层次的基于深层次的深层次学习的基于深层次的电源代码探测器探测器。 网络安全领域的一个已知问题是, 与未加贴标签的数据相比, 很难设计出对许多类型恶意活动的有效监测模型。 这也是PowerSell Shell 代码的低点代码。 我们的工作表明, 能够通过使用一个快速的精确的模板来通过使用我们所收集的准确的准确的版本数据来减轻问题, 。