In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or 28\% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone \& Bridges. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given.
翻译:在本文中,我们展示了对四个突出的恶意软件检测工具的科学评价,以协助一个组织在多个协议上交付的以下四个突出的恶意软件检测工具:基于 ML 的工具在多大程度上准确分类了以前和从不见之前的文件?它是否值得购买一个网络级的恶意软件检测器?为了找出弱点,我们测试了每个工具共3 536个文件(2 554 或 72 / 恶意、982 或 28 / / 良),其中包括数百个恶意零日、多金球和APT 式的文件,这些文件以多个协议形式交付。我们介绍了探测时间和准确性方面的统计结果,考虑补充性分析(同时使用多个工具),并提供两种新颖的应用,即最近Iannacone ⁇ Bridge的成本效益评估程序。虽然基于 ML 工具在检测零日文档和可执行性文件方面更为有效,但签名工具仍是一个总体更好的选择。两种基于网络的工具在与最终工具配对时都提供了大量(模拟)节省,但是在协议上检测率比HTTP或SMTP都低工具都低,但显示对近的检测率。我们的结果显示了所有四种工具都对近的准确性工具,包括了所有4个最差的模板。