With the rapid increasing number of open source software (OSS), the majority of the software vulnerabilities in the open source components are fixed silently, which leads to the deployed software that integrated them being unable to get a timely update. Hence, it is critical to design a security patch identification system to ensure the security of the utilized software. However, most of the existing works for security patch identification just consider the changed code and the commit message of a commit as a flat sequence of tokens with simple neural networks to learn its semantics, while the structure information is ignored. To address these limitations, in this paper, we propose our well-designed approach E-SPI, which extracts the structure information hidden in a commit for effective identification. Specifically, it consists of the code change encoder to extract the syntactic of the changed code with the BiLSTM to learn the code representation and the message encoder to construct the dependency graph for the commit message with the graph neural network (GNN) to learn the message representation. We further enhance the code change encoder by embedding contextual information related to the changed code. To demonstrate the effectiveness of our approach, we conduct the extensive experiments against six state-of-the-art approaches on the existing dataset and from the real deployment environment. The experimental results confirm that our approach can significantly outperform current state-of-the-art baselines.
翻译:开放源码软件(OSS)数量迅速增加,开放源码组件中软件的弱点大多是静态固定的,因此无法及时更新。因此,设计安全补丁识别系统至关重要,以确保所用软件的安全性。但是,安全补丁识别现有大多数工作都仅仅考虑到已修改的代码和承诺信息,将承诺信息作为信号的固定序列,使用简单的神经网络来学习其语义,而结构信息却被忽视。为了解决这些局限性,我们在本文件中提出了我们精心设计的 E-SPI 方法,其中提取了在有效识别承诺中隐藏的结构信息。具体地说,它包括代码转换编码器,以提取与BILSTM系统修改后的代码的合成,以学习代码表达方式和信息编码,以及承诺信息与图形神经网络(GNN)建立依赖性图表,以学习信息表达方式。我们通过嵌入与已修改的代码相关的背景信息来进一步加强代码转换编码的编码。我们从实际部署方法的6项实验中可以明显地确认我们当前基线方法的有效性。