代码为中心的学习型即时漏洞检测 (Code-centric Learning-based Just-In-Time Vulnerability Detection)

Attacks against computer systems exploiting software vulnerabilities can cause substantial damage to the cyber-infrastructure of our modern society and economy. To minimize the consequences, it is vital to detect and fix vulnerabilities as soon as possible. Just-in-time vulnerability detection (JIT-VD) discovers vulnerability-prone ("dangerous") commits to prevent them from being merged into source code and causing vulnerabilities. By JIT-VD, the commits' authors, who understand the commits properly, can review these dangerous commits and fix them if necessary while the relevant modifications are still fresh in their minds. In this paper, we propose CodeJIT, a novel code-centric learning-based approach for just-in-time vulnerability detection. The key idea of CodeJIT is that the meaning of the code changes of a commit is the direct and deciding factor for determining if the commit is dangerous for the code. Based on that idea, we design a novel graph-based representation to represent the semantics of code changes in terms of both code structures and program dependencies. A graph neural network model is developed to capture the meaning of the code changes represented by our graph-based representation and learn to discriminate between dangerous and safe commits. We conducted experiments to evaluate the JIT-VD performance of CodeJIT on a dataset of 20K+ dangerous and safe commits in 506 real-world projects from 1998 to 2022. Our results show that CodeJIT significantly improves the state-of-the-art JIT-VD methods by up to 66% in Recall, 136% in Precision, and 68% in F1. Moreover, CodeJIT correctly classifies nearly 9/10 of dangerous/safe (benign) commits and even detects 69 commits that fix a vulnerability yet produce other issues in source code

翻译：针对利用软件漏洞攻击计算机系统可能对现代社会和经济基础设施造成巨大损失的情况下，尽快检测和修复漏洞至关重要。即时漏洞检测（JIT-VD）可以发现有漏洞的提交并防止它们合并到源代码中从而导致漏洞。通过 JIT-VD，提交者可以在相关修改还新鲜在他们的脑海中时，修复这些危险的提交。在本文中，我们提出了一种基于代码为中心的学习型方法 CodeJIT，用于即时漏洞检测。CodeJIT 的关键思想是代码更改的含义是决定其是否有危险的直接因素。基于这个思想，我们设计了一种新颖的基于图形的表示方法，以代码结构和程序依赖关系的形式表示代码更改的语义。开发了图神经网络模型，以捕获由我们的基于图形的表示法所表示的代码更改的含义，并学习区分危险和安全提交。我们对506个涉及1998年至2022年506个真实项目中的20,000多个有危险和安全提交的数据集进行了实验来评估 CodeJIT 的 JIT-VD 性能。结果表明，CodeJIT在精度、召回率和F1(加权平均值）等方面显著改善了 JIT-VD 的最新方法，其中最大的改进幅度达到了66％、136％和68％。此外，CodeJIT可以正确分类近十分之九的有害/无害提交，并且甚至可以检测到可以修复漏洞却在源代码中引起其他问题的提交数量高达69个。