关于在软件缺陷预测中使用深学习软件缺陷预测 (On the Use of Deep Learning in Software Defect Prediction)

Context: Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL) techniques provide practitioners with the opportunities to automatically extract and learn from more complex and high-dimensional data. Objective: The purpose of this study is to systematically identify, analyze, summarize, and synthesize the current state of the utilization of DL algorithms for SDP in the literature. Method: We systematically selected a pool of 102 peer-reviewed studies and then conducted a quantitative and qualitative analysis using the data extracted from these studies. Results: Main highlights include: (1) most studies applied supervised DL; (2) two third of the studies used metrics as an input to DL algorithms; (3) Convolutional Neural Network is the most frequently used DL algorithm. Conclusion: Based on our findings, we propose to (1) develop more comprehensive DL approaches that automatically capture the needed features; (2) use diverse software artifacts other than source code; (3) adopt data augmentation techniques to tackle the class imbalance problem; (4) publish replication packages.

翻译：软件缺陷预测(SDP)方法日益得到应用,经常使用机器学习(ML)技术。然而,基于ML的现有方法需要人工提取功能,这些功能繁琐、耗时,难以捕捉错误报告工具中报告的语义信息。深入学习(DL)技术使从业人员有机会自动提取和从更复杂和高维的数据中学习。目标:本研究的目的是系统地识别、分析、总结和综合文献中为SDP使用DL算法的现状。方法:我们系统地挑选了102项同行审评研究的集合,然后利用从这些研究中提取的数据进行了定量和定性分析。结果:主要重点包括:(1) 大部分研究应用了受DL监督的研究;(2) 使用指标作为DL算法输入的三分之二的研究;(3) 动态神经网络是最常用的DL算法。结论:根据我们的调查结果,我们提议(1) 开发更加全面的DL方法,自动捕捉到所需的特征;(2) 使用除源码以外的多种软件制品;(3) 采用数据增强能力技术解决类失衡问题。