DC探测器:基于源代码代表下分布式的深共同学习的IoT终端脆弱采矿系统 (DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation)

Context: The IoT system infrastructure platform facility vulnerability attack has become the main battlefield of network security attacks. Most of the traditional vulnerability mining methods rely on vulnerability detection tools to realize vulnerability discovery. However, due to the inflexibility of tools and the limitation of file size, its scalability It is relatively low and cannot be applied to large-scale power big data fields. Objective: The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Method: In this paper, a new directional vulnerability mining method of parallel ensemble learning is proposed to solve the problem of large-scale data vulnerability mining. By extracting sensitive functions and statements, a sensitive statement library of vulnerable codes is formed. The AST stream-based vulnerability code slice with higher granularity performs doc2vec sentence vectorization on the source code through the random sampling module, obtains different classification results through distributed training through the Bi-LSTM trainer, and obtains the final classification result by voting. Results: This method designs and implements a distributed deep ensemble learning system software vulnerability mining system called DCDetector. It can make accurate predictions by using the syntactic information of the code, and is an effective method for analyzing large-scale vulnerability data. Conclusion: Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.

翻译：目标:IOT系统基础设施平台设施脆弱性攻击已成为网络安全攻击的主要战场。大多数传统的脆弱性采矿方法依靠脆弱性检测工具来发现脆弱性,但是,由于工具不灵活和文件大小的限制,其可缩放性较低,无法应用于大型数据大数据领域。目标:研究的目的是明智地探测高层次语言源代码(如C/C+++)中的脆弱程度。这使我们能够提出与敏感的刑罚相关的传统源代码的编码,并通过设计分布式的深度混合学习模型来检测脆弱性。方法:在本文件中,提出了一种新的定向脆弱性采矿方法,即平行的全套学习,以解决大规模数据脆弱性采矿问题。通过提取敏感功能和声明,形成了一个敏感的脆弱代码说明库。基于流的脆弱程度代码切片通过随机抽样模块在源代码上进行 doc2vec句矢量化,通过分布式培训,通过分布式的准确性精度模型来获取不同分类结果。方法:通过BI-LSTM系统模拟脆弱性分析师员,通过使用该系统学习的可靠程度数据方法,通过数据库系统进行最后分类分析,可以改进数据分析,并获得数据分析结果。通过数据库系统进行这一系统学习。