Software Defect Prediction aims at predicting which software modules are the most probable to contain defects. The idea behind this approach is to save time during the development process by helping find bugs early. Defect Prediction models are based on historical data. Specifically, one can use data collected from past software distributions, or Versions, of the same target application under analysis. Defect Prediction based on past versions is called Cross Version Defect Prediction (CVDP). Traditionally, Static Code Metrics are used to predict defects. In this work, we use the Class Dependency Network (CDN) as another predictor for defects, combined with static code metrics. CDN data contains structural information about the target application being analyzed. Usually, CDN data is analyzed using different handcrafted network measures, like Social Network metrics. Our approach uses network embedding techniques to leverage CDN information without having to build the metrics manually. In order to use the embeddings between versions, we incorporate different embedding alignment techniques. To evaluate our approach, we performed experiments on 24 software release pairs and compared it against several benchmark methods. In these experiments, we analyzed the performance of two different graph embedding techniques, three anchor selection approaches, and two alignment techniques. We also built a meta-model based on two different embeddings and achieved a statistically significant improvement in AUC of 4.7% (p < 0.002) over the baseline method.
翻译:软件失灵预测旨在预测哪些软件模块最有可能包含缺陷。 这种方法背后的想法是帮助早期发现错误, 从而在开发过程中节省时间。 失灵预测模型以历史数据为基础。 具体地说, 可以利用从以往软件发布或版本中收集的数据, 同一目标应用程序正在分析的版本。 过去版本的失灵预测方法被称为Cross Voice Deffect Villion(CVDP) 。 传统上, 静态代码计量器用于预测缺陷。 在这项工作中, 我们使用类依赖网络(CDN)作为缺陷的另一个预测器, 并结合静态代码指标。 CDN 数据包含关于正在分析的目标应用程序的结构信息。 通常, CDN 数据可以使用不同的手动网络措施, 如社会网络指标等 。 我们的方法使用网络嵌入技术来利用CDN信息, 而不必手工构建指标。 为了使用不同版本的嵌入, 我们采用了不同的嵌入式校准技术。 为了评估我们的方法, 我们用24个软件释放配对它进行了实验, 对照若干基准方法。 CDN 正在分析的目标应用程序 正在分析, 我们用两种不同的图表选择了两种不同的图表 。 两种不同的模型 。