预测软件仓库中报告的臭虫数量 (Predicting the Number of Reported Bugs in a Software Repository)

The bug growth pattern prediction is a complicated, unrelieved task, which needs considerable attention. Advance knowledge of the likely number of bugs discovered in the software system helps software developers in designating sufficient resources at a convenient time. The developers may also use such information to take necessary actions to increase the quality of the system and in turn customer satisfaction. In this study, we examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor. Further, we assess the impact of exogenous variables such as software release dates by incorporating those into the prediction models. We analyze the quality of long-term prediction for each model based on different performance metrics. The assessment is conducted on Mozilla, which is a large open-source software application. The dataset is originally mined from Bugzilla and contains the number of bugs for the project between Jan 2010 and Dec 2019. Our numerical analysis provides insights on evaluating the trends in a bug repository. We observe that LSTM is effective when considering long-run predictions whereas Random Forest Regressor enriched by exogenous variables performs better for predicting the number of bugs in the short term.

翻译：错误增长模式的预测是一项复杂、无法避免的任务, 需要相当的注意。预先了解软件系统中发现的错误可能数量, 有助于软件开发者在方便的时候指定足够资源。开发者还可以使用这些信息采取必要行动, 提高系统质量, 进而提高客户满意度。在这项研究中, 我们检查了八种不同的时间序列预测模型, 包括长期短期记忆神经网络( LSTM ) 、自动递减综合移动平均( ARIMA) 和随机森林回归器。此外, 我们通过将软件发布日期纳入预测模型来评估外源变量的影响。我们分析每个模型的长期预测质量, 以不同的性能衡量尺度为基础。评估是对 Mozilla 进行的, 这是一种大型的开放源软件应用。数据集最初是从Bugzilla 提取的, 包含2010 年1月至 2019 年年 12 月项目的错误数。我们的数值分析为评估错误存储库中的趋势提供了深刻的洞察力。我们观察到, LSTM 在考虑长期预测时, 而随机森林回归者则通过外部变量更好地预测。