更优调用图：用于恶意软件分类的新型函数调用图数据集 (Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification)

Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of graph-based classification methods. To address this gap, we introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique FCGs extracted from recent Android application packages (APKs). BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK. Through extensive experiments using baseline classifiers, we demonstrate the necessity and value of BCG compared to existing datasets. BCG is publicly available at https://erdemub.github.io/BCG-dataset.

翻译：函数调用图（FCG）已成为恶意软件检测的一种强大抽象方法，能够捕捉超越表层特征签名的应用程序行为结构。其在传统程序分析中的实用性已得到充分验证，可实现恶意软件的有效分类与分析。在移动领域，特别是在Android生态系统中，基于FCG的恶意软件分类尤为关键，这归因于该平台的广泛普及以及Android应用程序复杂的基于组件的架构。然而，这一方向的研究进展因缺乏大规模、高质量的Android专用FCG数据集而受到阻碍。现有数据集往往已过时，主要由应用程序重打包产生的小型或冗余图所主导，未能反映现实世界恶意软件的多样性。这些局限性导致基于图的分类方法容易过拟合且评估结果不可靠。为弥补这一缺陷，我们推出了Better Call Graphs（BCG）——一个从近期Android应用程序包（APK）中提取的大规模独特FCG综合数据集。BCG包含涵盖不同家族与类型的良性及恶意样本，并为每个APK提供图级特征。通过使用基线分类器进行大量实验，我们证明了BCG相较于现有数据集的必要性与价值。BCG已在https://erdemub.github.io/BCG-dataset公开提供。