Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data remains as the main barrier to industrial adoption. Despite this problem, there have been no systematic efforts to analyse the existing SV data preparation techniques and challenges. Without such insights, we are unable to overcome the challenges and advance this research domain. Hence, we are motivated to conduct a Systematic Literature Review (SLR) of SVP research to synthesize and gain an understanding of the data considerations, challenges and solutions that SVP researchers provide. From our set of primary studies, we identify the main practices for each data preparation step. We then present a taxonomy of 16 key data challenges relating to six themes, which we further map to six categories of solutions. However, solutions are far from complete, and there are several ill-considered issues. We also provide recommendations for future areas of SV data research. Our findings help illuminate the key SV data practices and considerations for SVP researchers and practitioners, as well as inform the validity of the current SVP approaches.
翻译:软件脆弱性预测(SVP)是一种软件质量保证的数据驱动技术,最近在软件工程研究界引起了相当大的关注。然而,编制软件脆弱性相关数据的困难仍然是工业采用数据的主要障碍。尽管存在这一问题,但并没有系统地努力分析现有的SV数据编制技术和挑战。没有这种洞察力,我们就无法克服挑战,推进这一研究领域。因此,我们有动力对SVP研究进行系统文学审查,以综合并了解SVP研究人员提供的数据考虑、挑战和解决办法。我们从一系列主要研究中找出了每个数据编制步骤的主要做法。我们随后对与六个主题有关的16项关键数据挑战进行了分类,我们进一步绘制了六类解决办法的图。然而,解决办法远没有完成,而且存在一些考虑不周的问题。我们还为SV数据研究的未来领域提供了建议。我们的调查结果有助于为SVP研究人员和从业人员说明关键SV数据做法和考虑,并为目前SVP方法的正确性提供了信息。