早期生命周期软件的缺陷预测。为什么?如何? (Early Life Cycle Software Defect Prediction. Why? How?)

Many researchers assume that, for software analytics, "more data is better". We write to show that, at least for learning defect predictors, this may not be true. To demonstrate this, we analyzed hundreds of popular GitHub projects. These projects ran for 84 months and contained 3,728 commits (median values). Across these projects, most of the defects occur very early in their life cycle. Hence, defect predictors learned from the first 150 commits and four months perform just as well as anything else. This means that, at least for the projects studied here, after the first few months, we need not continually update our defect prediction models. We hope these results inspire other researchers to adopt a "simplicity-first" approach to their work. Indeed, some domains require a complex and data-hungry analysis. But before assuming complexity, it is prudent to check the raw data looking for "short cuts" that simplify the whole analysis.

翻译：许多研究人员认为,对于软件分析来说,“更多的数据更好 ” 。我们写作是为了表明,至少对于学习缺陷预测器来说,这也许不是事实。为了证明这一点,我们分析了数百个受欢迎的GitHub项目。这些项目运行了84个月,包含3,728个承诺(中间值 ) 。在这些项目中,大多数缺陷都发生在它们的生命周期的很早阶段。因此,从最初150个承诺中学习的缺陷预测器和四个月的缺陷预测器同样有效。这意味着,至少对于在这里研究的项目来说,至少在头几个月后,我们不需要不断更新我们的缺陷预测模型。我们希望这些结果能激励其他研究人员对其工作采取“简单第一”的方法。事实上,有些领域需要复杂和数据饥饿的分析。但在假设复杂性之前,检查原始数据以“短切”简化整个分析是明智的。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

“AI界漫威” 深度学习超级英雄联盟漫画：吴恩达，李飞飞…

专知会员服务

24+阅读 · 2020年11月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日