Software vulnerabilities can pose severe harms to a computing system. They can lead to system crash, privacy leakage, or even physical damage. Correctly identifying vulnerabilities among enormous software codes in a timely manner is so far the essential prerequisite to patch them. Unfortantely, the current vulnerability identification methods, either the classic ones or the deep-learning-based ones, have several critical drawbacks, making them unable to meet the present-day demands put forward by the software industry. To overcome the drawbacks, in this paper, we propose DeepVulSeeker, a novel fully automated vulnerability identification framework, which leverages both code graph structures and the semantic features with the help of the recently advanced Graph Representation Self-Attention and pre-training mechanisms. Our experiments show that DeepVulSeeker not only reaches an accuracy as high as 0.99 on traditional CWE datasets, but also outperforms all other exisiting methods on two highly-complicated datasets. We also testified DeepVulSeeker based on three case studies, and found that DeepVulSeeker is able to understand the implications of the vulnerbilities. We have fully implemented DeepVulSeeker and open-sourced it for future follow-up research.
翻译:软件脆弱性可能对计算机系统造成严重伤害。 它们可能导致系统崩溃、隐私泄漏,甚至物理损害。 及时正确识别巨大软件代码中的弱点是修复这些代码的基本先决条件。 无意中,目前的脆弱性识别方法,无论是经典的还是深层学习的,有几个关键的缺陷,使它们无法满足软件行业当前提出的要求。 为了克服缺陷,我们在本文件中提议建立一个全新的完全自动化的脆弱性识别框架DeepVulSeker,这是一个全新的全自动的脆弱性识别框架,它利用最近先进的图形代表自我注意和训练前机制的帮助,利用了代码图形结构以及语义特征。我们的实验显示,DeepVulSeker不仅在传统的CWE数据集上达到0.99的精度,而且超越了在两个高度复杂的数据集上所有其他的外推方法。我们还根据三个案例研究,作证了DeepVulSeerker,发现DeepVulSeerker能够理解其未来源码研究的影响。 我们完全实施了深VulSeerker的跟踪和跟踪。