Traceability approves trace links among software artifacts based on whether two artifacts are related by system functionalities. The traces are valuable for software development, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, automated approaches are proposed to recover traces through textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, the low quality & quantity of artifact texts negatively impact the calculated IR values, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. We first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. A nine-system-based evaluation shows that in general, when solely used to enhance IR techniques, our approach can outperform pure IR-based approaches and another baseline by 21.9% & 21.8% in AP, and 9.3% & 7.2% in MAP, respectively. Moreover, when used to collaborate with another enhancing strategy from different perspectives, it can outperform this baseline by 5.9% in AP and 4.8% in MAP.
翻译:根据两种工艺品是否与系统功能相关,可追溯性批准软件文物之间的追踪链接。这些痕迹对于软件开发是有价值的,但很难手动获得。为了应对成本高、可失信的人工回收,建议采用自动化方法,通过软件文物之间的文本相似性(如基于信息检索的工艺品)来恢复追踪。然而,人工制品文本的质量和数量低,对计算出来的IR值产生了负面影响,从而大大妨碍了IR 方法的性能。在这项研究中,我们提议从要求和代码(即双方同意的双词)的文本结构中提取共同的词对,以改进IR的追踪回收。为了应对成本昂贵和易失信的人工制品。我们首先根据需求文本的部分内容收集一套双词,然后通过代码文本过滤。然而,我们用这些共同的双词来丰富IR技术的输入内容,从而强化IR 方法的性能。基于九种系统的评估表明,在仅用于加强IR技术的文本结构结构结构(即双方同意的双词)中,我们的方法可以超越基于IR IMB 的纯I-R-98 方法,然后用另一种IM- IMAP- 21 % 的基线方法,然后用另一种方法,用另一种基准方法,用新的IM-IM- IM- IM- IMAP-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-r-l-r-r-r-r-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-