In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models that require large training sets to effectively generalize the patterns present in the historical data to future data.
翻译:在本文中,我们介绍ApacheJIT,这是一个用于 " 时对时的错误预测 " 的大型数据集。ApacheJIT由流行的Apache项目中清洁和诱虫软件变化构成。ApacheJIT共有106,674项承诺(28,239项诱虫和78,435项清洁承诺 ) 。 大量承诺使ApacheJIT成为机器学习模型的合适数据集,特别是需要大型培训的深层学习模型,以便有效地将历史数据中存在的模式归纳为未来数据。