Computer vision is playing an increasingly important role in automated malware detection with to the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We introduce MalNet, the largest publicly available cybersecurity image database, offering 133x more images and 27x more classes than the only other public binary-image database. MalNet contains over 1.2 million images across a hierarchy of 47 types and 696 families. We provide extensive analysis of MalNet, discussing its properties and provenance. The scale and diversity of MalNet unlocks new and exciting cybersecurity opportunities to the computer vision community--enabling discoveries and research directions that were previously not possible. The database is publicly available at www.mal-net.org.
翻译:计算机视觉在自动检测恶意软件方面发挥着越来越重要的作用,使图像以图为基础的二进制图象上升。这些二进制图象迅速生成,不需要特别工程,而且具有适应流行的模糊方法的复原力。但是,在这一领域已经进行了重大研究,但仅限于只有少数工业实验室和研究团队可以访问的小型或私人数据集。这种缺乏可用性阻碍了对现有工作的检查、新研究的开发以及思想的传播。我们引入了最大公开的网络安全图象数据库MalNet,比其他公共二进制图象数据库多133x图像和27x类。MalNet包含47种和696个家庭等级的120多万图象。我们广泛分析马尔网,讨论其属性和来源。MalNet的规模和多样性为计算机视觉社区-能动的发现和研究方向打开了前所未有的新的和令人兴奋的网络安全机会。该数据库在www.mal-net.org上公开提供。