This paper describes a multi-feature dataset for training machine learning classifiers for detecting malicious Windows Portable Executable (PE) files. The dataset includes four feature sets from 18,551 binary samples belonging to five malware families including Spyware, Ransomware, Downloader, Backdoor and Generic Malware. The feature sets include the list of DLLs and their functions, values of different fields of PE Header and Sections. First, we explain the data collection and creation phase and then we explain how did we label the samples in it using VirusTotal's services. Finally, we explore the dataset to describe how this dataset can benefit the researchers for static malware analysis. The dataset is made public in the hope that it will help inspire machine learning research for malware detection.
翻译:本文描述了用于培训机器学习分类的多功能数据集,用于检测恶意视窗便携式可执行文件的恶意窗口文件。该数据集包括来自Spyware、Ransomware、下载器、后门和通用Malware等5个恶意软件家庭18 551个二进制样本的4个特写数据集。这些特写数据集包括DLL及其功能清单、PE Heper和分节不同领域的数值。首先,我们解释数据收集和创建阶段,然后解释我们如何使用病毒Totar的服务在样本中贴上标签。最后,我们探索该数据集,以说明该数据集如何有利于研究人员进行静态的恶意软件分析。数据集被公诸于众,希望它有助于鼓励机器学习对错误软件的检测进行研究。