利用图表简化和强化图表代表制学习发现脆弱性 (Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning)

Prior studies have demonstrated the effectiveness of Deep Learning (DL) in automated software vulnerability detection. Graph Neural Networks (GNNs) have proven effective in learning the graph representations of source code and are commonly adopted by existing DL-based vulnerability detection methods. However, the existing methods are still limited by the fact that GNNs are essentially difficult to handle the connections between long-distance nodes in a code structure graph. Besides, they do not well exploit the multiple types of edges in a code structure graph (such as edges representing data flow and control flow). Consequently, despite achieving state-of-the-art performance, the existing GNN-based methods tend to fail to capture global information (i.e., long-range dependencies among nodes) of code graphs. To mitigate these issues, in this paper, we propose a novel vulnerability detection framework with grAph siMplification and enhanced graph rePresentation LEarning, named AMPLE. AMPLE mainly contains two parts: 1) graph simplification, which aims at reducing the distances between nodes by shrinking the node sizes of code structure graphs; 2) enhanced graph representation learning, which involves one edge-aware graph convolutional network module for fusing heterogeneous edge information into node representations and one kernel-scaled representation module for well capturing the relations between distant graph nodes. Experiments on three public benchmark datasets show that AMPLE outperforms the state-of-the-art methods by 0.39%-35.32% and 7.64%-199.81% with respect to the accuracy and F1 score metrics, respectively. The results demonstrate the effectiveness of AMPLE in learning global information of code graphs for vulnerability detection.

翻译：先前的研究显示深学习(DL)在自动软件脆弱性检测中的有效性。图神经网络(GNNS)在学习源代码的图形表达方式方面证明是有效的,并被现有的基于 DL 的脆弱度检测方法普遍采用。然而,现有的方法仍然有限,因为GNNS基本上难以在代码结构图中处理长距离节点之间的联系。此外,它们没有很好地利用代码结构图中多种类型的边缘(例如代表数据流动和控制流动的边缘) 。因此,尽管实现了最先进的性能,但现有的GNN网络(GNNs)往往无法捕捉到源代码代码代码代码代码的图形的图形表达方式,(例如,节点之间的长距离依赖) 。GNNNNS基本上难以在代码结构图中处理长节点节点,35 现有的GNNS 方法往往无法捕捉到全球信息(例如,节点之间的长距离依赖) 。为了缓解这些问题,我们在本文中提出一个新的脆弱性检测框架的LEOLM 。AMP 主要是有两个部分: 1) 图形简化,通过缩小代码结构图表结构图层图的缩缩缩缩缩略图图图的缩缩缩缩缩缩缩缩缩缩缩缩缩缩图, 。