我们解决了问题吗? (Deep Learning for Agile Effort Estimation Have We Solved the Problem Yet?)

In the last decade, several studies have proposed the use of automated techniques to estimate the effort of agile software development. In this paper we perform a close replication and extension of a seminal work proposing the use of Deep Learning for agile effort estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baseline techniques (i.e., Random, Mean and Median effort prediction) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SE), as done in the original study. To this end, we use both the data from the original study and a new larger dataset of 31,960 issues, which we mined from 29 open-source projects. Using more data allows us to strengthen our confidence in the results and further mitigate the threat to the external validity of the study. We also extend the original study by investigating two additional research questions. One evaluates the accuracy of Deep-SE when the training set is augmented with issues from all other projects available in the repository at the time of estimation, and the other examines whether an expensive pre-training step used by the original Deep-SE, has any beneficial effect on its accuracy and convergence speed. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SE in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play a role in improving its accuracy and convergence speed. ...

翻译：在过去十年中,一些研究提议使用自动化技术来估计软件开发的灵活度。在本文件中,我们像最初的研究一样,对一项开创性工作(深研计划)进行密切复制和扩展,提议使用深研计划来进行敏化的努力估算(深研计划),从而确立了自那以来的最新水平。具体地说,我们复制了三个原始研究问题,目的是调查深研计划在项目内部和跨项目努力估算方面的有效性。我们用三种基线技术(即随机、中、中、中、中)以及先前提出的一种方法来估计软件开发工作的灵活度(深研计划/IDF-SE),建议采用深研计划来进行弹性的工作估算(深研计划深研计划),建议采用深研计划(深研计划)的精度估算(深研计划),提出31,960个问题的新增数据集,我们从29个开源前项目中提取了这些数据。我们利用更多的数据,可以加强我们对结果的信心,进一步减轻对外部有效性的威胁。我们还通过调查另外两个研究问题来扩大初始研究。在深研订计划(深研订计划)中评估深深深研的精度准确度准确度准确度的准确度的准确度的精确度的精确度,而精度的精度的精确度,在评估中,从以前的精度的精度的精度的精度的精度的精度分析结果显示的精度上,从以前的精度显示的精度显示的精度显示的精度显示的精度显示的精度显示的精度显示的精度,从以前的精度显示的精度显示的精度显示的精度,而不是从以前的精度,从以前的精度显示的精度,从以前的精度显示之前的精度的精度的精度的精度的精度的精度的精度的精度的精度显示的精度,从以前的精度,从以前的精度显示的精度显示的精度的精度的精度显示的精度显示的精度的精度的精度的精度的精度的精度显示的精度,从以前的精度显示的精度显示的精度的精度,从以前的精度,从以前的精度,从以前的精度,从以前的精度显示的精度显示的精度显示的精度