Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of graph-based classification methods. To address this gap, we introduce Better Call Graphs (BCG), a comprehensive dataset of large and unique FCGs extracted from recent Android application packages (APKs). BCG includes both benign and malicious samples spanning various families and types, along with graph-level features for each APK. Through extensive experiments using baseline classifiers, we demonstrate the necessity and value of BCG compared to existing datasets. BCG is publicly available at https://erdemub.github.io/BCG-dataset.
翻译:函数调用图(FCG)已成为恶意软件检测的一种强大抽象方法,能够捕捉超越表层特征签名的应用程序行为结构。其在传统程序分析中的实用性已得到充分验证,可实现恶意软件的有效分类与分析。在移动领域,特别是在Android生态系统中,基于FCG的恶意软件分类尤为关键,这归因于该平台的广泛普及以及Android应用程序复杂的基于组件的架构。然而,这一方向的研究进展因缺乏大规模、高质量的Android专用FCG数据集而受到阻碍。现有数据集往往已过时,主要由应用程序重打包产生的小型或冗余图所主导,未能反映现实世界恶意软件的多样性。这些局限性导致基于图的分类方法容易过拟合且评估结果不可靠。为弥补这一缺陷,我们推出了Better Call Graphs(BCG)——一个从近期Android应用程序包(APK)中提取的大规模独特FCG综合数据集。BCG包含涵盖不同家族与类型的良性及恶意样本,并为每个APK提供图级特征。通过使用基线分类器进行大量实验,我们证明了BCG相较于现有数据集的必要性与价值。BCG已在https://erdemub.github.io/BCG-dataset公开提供。