Malware detection has become a major concern due to the increasing number and complexity of malware. Traditional detection methods based on signatures and heuristics are used for malware detection, but unfortunately, they suffer from poor generalization to unknown attacks and can be easily circumvented using obfuscation techniques. In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data and have become a solution preferred over traditional methods. More recently, the application of such techniques on graph-structured data has achieved state-of-the-art performance in various domains and demonstrates promising results in learning more robust representations from malware. Yet, no literature review focusing on graph-based deep learning for malware detection exists. In this survey, we provide an in-depth literature review to summarize and unify existing works under the common approaches and architectures. We notably demonstrate that Graph Neural Networks (GNNs) reach competitive results in learning robust embeddings from malware represented as expressive graph structures, leading to an efficient detection by downstream classifiers. This paper also reviews adversarial attacks that are utilized to fool graph-based detection methods. Challenges and future research directions are discussed at the end of the paper.
翻译:恶意软件检测因恶意软件数量增加和对抗性攻击复杂度的提高而成为广泛关注的问题。传统的基于签名和启发式规则的检测方法具有对未知攻击的泛化能力不足和易受混淆技术攻击等缺点。近年来,机器学习,尤其是深度学习通过从数据中学习有用表示而在恶意软件检测方面取得了惊人的成果,并成为比传统方法更受青睐的解决方案。最近,这些技术在以图形式表示的数据上的应用在各领域取得了最先进的表现,并展示了在学习来自恶意软件的更健壮的表示方面的有希望的结果。然而,尚未存在着针对基于图形式深度学习的恶意软件检测的文献综述。在本综述中,我们提供了一个深入的文献综述,对现有的作品进行了总结和统一,同时阐明了共同的方法和架构。特别是,我们证明了图形神经网络在从表示为表达式图形结构的恶意软件中学习健壮嵌入方面取得了具有竞争力的结果,从而通过下游分类器实现了有效检测。本文还回顾了用于愚弄基于图形式检测方法的敌对性攻击。本文最后讨论了挑战和未来的研究方向。