Computer vision is playing an increasingly important role in automated malware detection with the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We release MalNet-Image, the largest public cybersecurity image database, offering 24x more images and 70x more classes than existing databases (available at https://mal-net.org). MalNet-Image contains over 1.2 million malware images -- across 47 types and 696 families -- democratizing image-based malware capabilities by enabling researchers and practitioners to evaluate techniques that were previously reported in propriety settings. We report the first million-scale malware detection results on binary images. MalNet-Image unlocks new and unique opportunities to advance the frontiers of machine learning, enabling new research directions into vision-based cyber defenses, multi-class imbalanced classification, and interpretable security.
翻译:随着基于图像的二进制代表制的崛起,计算机的视觉正在自动检测恶意软件方面发挥着越来越重要的作用。这些二进制图像快速生成,不需要功能工程,并且具有适应流行的模糊方法的复原力。在这方面已经进行了重大研究,但是,它仅限于只有少数工业实验室和研究团队能够访问的小规模或私人数据集。这种缺乏可用性阻碍了对现有工作的检查、新研究的开发以及思想的传播。我们发布了最大的公共网络图像数据库MalNet-Image,提供了24x多图像,比现有数据库(https://mal-net.org提供)多70x类。 MalNet-Image包含120多万张恶意软件图像,覆盖47种类型和696个家庭,通过使研究人员和从业人员能够评估先前在专有环境中报告的技术,使基于图像的恶意软件能力民主化。我们报告了在二进制图像上首次100万个规模的恶意软件检测结果。 MalNet-Image释放了新的和独特的机会,以推进机器学习的前沿,为基于视觉的网络安全分类提供了新的研究方向。