Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks. Source code static analysis has been used extensively to detect the unintentional defects, i.e. vulnerabilities, introduced into the source codes by software developers. In this paper, we propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing. The proposed approach uses a hierarchical process to first identify source codes with vulnerabilities, and then it identifies the lines of codes that contribute to the vulnerability within the detected source codes. This proposed two-step approach reduces the false alarm of detecting vulnerable lines. Our extensive experiment on real-world and synthetic codes collected in NVD and SARD shows high accuracy (about 98\%) in detecting source code vulnerabilities.
翻译:源代码静态分析被软件开发者广泛用于检测源代码中的无意缺陷,即脆弱性;在本文件中,我们建议根据自然语言处理过程中使用的技术,从软件的LLVM IR表征中发现脆弱性的深层次学习方法;拟议方法使用等级程序,首先确定有脆弱性的源代码,然后确定导致所发现源代码中脆弱性的代码线;这提议了分两步走的方法,以降低探测脆弱线的虚假警报;我们根据NVD和SARD收集的关于实际世界和合成代码的广泛实验显示,在发现源代码脆弱性方面(大约98 ⁇ )高度精确。