通过从开放世界数据学习转让加强自动软件可追踪性 (Enhancing Automated Software Traceability by Transfer Learning from Open-World Data)

Software requirements traceability is a critical component of the software engineering process, enabling activities such as requirements validation, compliance verification, and safety assurance. However, the cost and effort of manually creating a complete set of trace links across natural language artifacts such as requirements, design, and test-cases can be prohibitively expensive. Researchers have therefore proposed automated link-generation solutions primarily based on information-retrieval (IR) techniques; however, these solutions have failed to deliver the accuracy needed for full adoption in industrial projects. Improvements can be achieved using deep-learning traceability models; however, their efficacy is impeded by the limited size and availability of project-level artifacts and links to serve as training data. In this paper, we address this problem by proposing and evaluating several deep-learning approaches for text-to-text traceability. Our method, named NLTrace, explores three transfer learning strategies that use datasets mined from open world platforms. Through pretraining Language Models (LMs) and leveraging adjacent tracing tasks, we demonstrate that NLTrace can significantly improve the performance of LM based trace models when training links are available. In such scenarios NLTrace outperforms the best performing classical IR method with an 188% improvement in F2 score and 94.01% in Mean Average Precision (MAP). It also outperforms the general LM based trace model by 7% and 23% for F2 and MAP respectively. In addition, NLTrace can adapt to low-resource tracing scenarios where other LM models can not. The knowledge learned from adjacent tasks enables NLTrace to outperform VSM models by 28% F2 on generation challenges when presented with a small number of training examples.

翻译：软件要求的可追踪性是软件工程过程的关键组成部分,有利于诸如要求验证、合规核查和安全保障等活动。然而,人工在自然语言工艺品(如要求、设计和测试箱)之间建立一套完整的跟踪链接的成本和努力可能极其昂贵。因此,研究人员提议了主要基于信息检索技术的自动链接生成解决方案;然而,这些解决方案未能提供工业项目全面采用所需的准确性。利用深层次学习跟踪模型可以实现改进;然而,由于项目级工艺品和链接的规模和可用性有限,无法成为培训数据,因此其功效受到阻碍。在本文件中,我们通过提出和评价若干关于文本到文本追踪的深层次学习方法来解决这个问题。我们称为NLTCR的这一方法探索了使用开放世界平台中数据集的三种传输学习战略。通过预先培训语言模型(LMM)和利用相邻的追踪任务,我们证明NLTRC能够大大改进基于LM的追踪模型的性能,在具备培训链接时,其效率受到阻碍。在NLTCS2级培训模型和链接中,NLTLLLLM的精度超过94M的模型, 并且以最优化的RLLLLLLF的R的成绩分分分数。在181中,也可以的LF的成绩中,也可以进行最优的LF的LTF的成绩,也可以的成绩。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日