PatchRNN:一个以深学习为基础的安全补丁识别系统 (PatchRNN: A Deep Learning-Based System for Security Patch Identification)

With the increasing usage of open-source software (OSS) components, vulnerabilities embedded within them are propagated to a huge number of underlying applications. In practice, the timely application of security patches in downstream software is challenging. The main reason is that such patches do not explicitly indicate their security impacts in the documentation, which would be difficult to recognize for software maintainers and users. However, attackers can still identify these "secret" security patches by analyzing the source code and generate corresponding exploits to compromise not only unpatched versions of the current software, but also other similar software packages that may contain the same vulnerability due to code cloning or similar design/implementation logic. Therefore, it is critical to identify these secret security patches to enable timely fixes. To this end, we propose a deep learning-based defense system called PatchRNN to automatically identify secret security patches in OSS. Besides considering descriptive keywords in the commit message (i.e., at the text level), we leverage both syntactic and semantic features at the source-code level. To evaluate the performance of our system, we apply it on a large-scale real-world patch dataset and conduct a case study on a popular open-source web server software - NGINX. Experimental results show that the PatchRNN can successfully detect secret security patches with a low false positive rate.

翻译：随着开放源码(OSS)组件的使用不断增加,内部的薄弱环节被传播到大量基本应用中。实际上,在下游软件中及时应用安全补丁具有挑战性。主要理由是,这种补丁没有明确表明其在文档中的安全影响,这对于软件维护者和用户来说很难识别。但是,攻击者仍然可以通过分析源码来识别这些“秘密”安全补丁,并产生相应的利用,不仅妥协当前软件的无喷出版本,而且妥协其它类似的软件包,这些软件可能包含由于代码克隆或类似的设计/实施逻辑而具有的同样脆弱性。因此,必须查明这些秘密安全补丁,以便能够及时修复。为此,我们提议建立一个深层次的基于学习的防御系统,称为PatchRNNN,以自动识别软件维护者和用户的安全补丁。除了考虑承诺信息中的描述性关键字节(即文本级别)之外,我们还在源码层面利用合成和语义特征。为了评估我们的系统绩效,我们必须在一个大规模现实-世界的加密补丁基数据集上应用它,以便用一个虚拟的加密的加密服务器能够成功地检测到一个正式的密码化的软拷贝数据。