We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. To ensure accurate malware classification, we employ the VirusTotal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusTotal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware.
翻译:我们提出了MH-1M,这是用于高级安卓恶意软件研究的最全面且最新的数据集之一。该数据集包含1,340,515个应用程序,涵盖广泛的特征和丰富的元数据。为确保准确的恶意软件分类,我们采用VirusTotal API,整合多个检测引擎以实现全面可靠的评估。我们在GitHub、Figshare和哈佛Dataverse上的存储库提供了对处理后的数据集及其大量补充元数据的开放访问,数据总量超过400 GB,包括特征提取流程的输出以及相应的VirusTotal报告。我们的研究结果强调了MH-1M数据集在理解恶意软件不断演变的格局中的宝贵作用。