About 40% of software bug reports are duplicates of one another, which pose a major overhead during software maintenance. Traditional techniques often focus on detecting duplicate bug reports that are textually similar. However, in bug tracking systems, many duplicate bug reports might not be textually similar, for which the traditional techniques might fall short. In this paper, we conduct a large-scale empirical study to better understand the impacts of textual dissimilarity on the detection of duplicate bug reports. First, we collect a total of 92,854 bug reports from three open-source systems and construct two datasets containing textually similar and textually dissimilar duplicate bug reports. Then we determine the performance of three existing techniques in detecting duplicate bug reports and show that their performance is significantly poor for textually dissimilar duplicate reports. Second, we analyze the two groups of bug reports using a combination of descriptive analysis, word embedding visualization, and manual analysis. We found that textually dissimilar duplicate bug reports often miss important components (e.g., expected behaviors and steps to reproduce), which could lead to their textual differences and poor performance by the existing techniques. Finally, we apply domain-specific embedding to duplicate bug report detection problems, which shows mixed results. All these findings above warrant further investigation and more effective solutions for detecting textually dissimilar duplicate bug reports.
翻译:大约40%的软件错误报告是相互重复的,在软件维护期间构成一个重大间接费用。传统技术通常侧重于检测文本相似的重复错误报告。然而,在错误跟踪系统中,许多重复的错误报告可能不是文本相似的,传统技术可能落后。在本文中,我们进行了大规模的经验研究,以更好地了解文本差异对检测重复错误报告的影响。首先,我们从三个开放源码系统中共收集92,854个错误报告,并构建两个数据集,其中含有文本相似和文本上不同重复的错误报告。然后,我们确定三个现有技术在检测重复错误报告方面的性能,并表明这些技术在文本上不同重复报告方面的性能非常差。第二,我们利用描述性分析、将字词嵌入视觉化和手动分析的组合,对两组错误报告进行了分析。我们发现,文本不相似的重复错误报告往往错失重要的组成部分(例如,预期的行为和复制的步骤),这可能导致文本差异,以及现有方法的重复报告。最后,我们运用了三种现有技术的重复性结论报告来分析这些结果的重复性报告。