Software requirements traceability is a critical component of the software engineering process, enabling activities such as requirements validation, compliance verification, and safety assurance. However, the cost and effort of manually creating a complete set of trace links across natural language artifacts such as requirements, design, and test-cases can be prohibitively expensive. Researchers have therefore proposed automated link-generation solutions primarily based on information-retrieval (IR) techniques; however, these solutions have failed to deliver the accuracy needed for full adoption in industrial projects. Improvements can be achieved using deep-learning traceability models; however, their efficacy is impeded by the limited size and availability of project-level artifacts and links to serve as training data. In this paper, we address this problem by proposing and evaluating several deep-learning approaches for text-to-text traceability. Our method, named NLTrace, explores three transfer learning strategies that use datasets mined from open world platforms. Through pretraining Language Models (LMs) and leveraging adjacent tracing tasks, we demonstrate that NLTrace can significantly improve the performance of LM based trace models when training links are available. In such scenarios NLTrace outperforms the best performing classical IR method with an 188% improvement in F2 score and 94.01% in Mean Average Precision (MAP). It also outperforms the general LM based trace model by 7% and 23% for F2 and MAP respectively. In addition, NLTrace can adapt to low-resource tracing scenarios where other LM models can not. The knowledge learned from adjacent tasks enables NLTrace to outperform VSM models by 28% F2 on generation challenges when presented with a small number of training examples.
翻译:软件要求的可追踪性是软件工程过程的关键组成部分,有利于诸如要求验证、合规核查和安全保障等活动。然而,人工在自然语言工艺品(如要求、设计和测试箱)之间建立一套完整的跟踪链接的成本和努力可能极其昂贵。因此,研究人员提议了主要基于信息检索技术的自动链接生成解决方案;然而,这些解决方案未能提供工业项目全面采用所需的准确性。利用深层次学习跟踪模型可以实现改进;然而,由于项目级工艺品和链接的规模和可用性有限,无法成为培训数据,因此其功效受到阻碍。在本文件中,我们通过提出和评价若干关于文本到文本追踪的深层次学习方法来解决这个问题。我们称为NLTCR的这一方法探索了使用开放世界平台中数据集的三种传输学习战略。通过预先培训语言模型(LMM)和利用相邻的追踪任务,我们证明NLTRC能够大大改进基于LM的追踪模型的性能,在具备培训链接时,其效率受到阻碍。在NLTCS2级培训模型和链接中,NLTLLLLM的精度超过94M的模型, 并且以最优化的RLLLLLLF的R的成绩分分分数。在181中,也可以的LF的成绩中,也可以进行最优的LF的LTF的成绩,也可以的成绩。