Agile Efffort 估计:我们解决了问题吗? (Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study)

In the last decade, several studies have explored automated techniques to estimate the effort of agile software development. We perform a close replication and extension of a seminal work proposing the use of Deep Learning for Agile Effort Estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baselines (i.e., Random, Mean and Median effort estimators) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SVM), as done in the original study. To this end, we use the data from the original study and an additional dataset of 31,960 issues mined from TAWOS, as using more data allows us to strengthen the confidence in the results, and to further mitigate external validity threats. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SVM in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play lead to an improvement of its accuracy and convergence speed. These results suggest that using semantic similarity is not enough to differentiate user stories with respect to their story points; thus, future work has yet to explore and find new techniques and features that obtain accurate agile software development estimates.

翻译：在过去十年中,一些研究探索了自动化技术,以估计软件开发的灵敏度;我们像最初的研究一样,密切复制和扩展了提议使用深学习用于Agile Efffort-SE的模拟(即深思深思,即深思后科)的开创性工作,从而确立了自此以来的艺术。具体地说,我们复制了三个原始研究问题,目的是调查深思后科的有效性,以便进行项目内和跨项目工作估算。我们根据三个基线(即随机、中观和中观工作估测器)和先前提出的一种方法,来估计软件开发工作的敏捷略性(深思后科/IDF-SVM),建议使用深思后科数据,我们使用原始研究的数据和从TAWOS提取的31,960个问题的额外数据集,因为更多的数据使我们得以加强对结果的信心,并进一步减轻外部有效性威胁。我们复制的结果显示,深思后发现深思后测试比中测算和TF-SF-SVM的基线和深度估算方法,如最初的精确度,而不是精确度,我们最初的准确度,因此,我们使用最初的精确测测算和后测算的精确性,因此测算,只有后测得后测测测测测算的精确性,因此,因此测测测测测测了前两个的精确性测算,因此,因此测测测算,因此测测算的精确性测了前几案例的精确性,因此测了前两个案例的精确性测算结果,因此,因此测到后测算结果,因此测测算结果。