Software requirements specification is undoubtedly critical for the whole software life-cycle. Nowadays, writing software requirements specifications primarily depends on human work. Although massive studies have been proposed to fasten the process via proposing advanced elicitation and analysis techniques, it is still a time-consuming and error-prone task that needs to take domain knowledge and business information into consideration. In this paper, we propose an approach, named ReqGen, which can provide recommendations by automatically generating natural language requirements specifications based on certain given keywords. Specifically, ReqGen consists of three critical steps. First, keywords-oriented knowledge is selected from domain ontology and is injected to the basic Unified pre-trained Language Model (UniLM) for domain fine-tuning. Second, a copy mechanism is integrated to ensure the occurrence of keywords in the generated statements. Finally, a requirement syntax constrained decoding is designed to close the semantic and syntax distance between the candidate and reference specifications. Experiments on two public datasets from different groups and domains show that ReqGen outperforms six popular natural language generation approaches with respect to the hard constraint of keywords(phrases) inclusion, BLEU, ROUGE and syntax compliance. We believe that ReqGen can promote the efficiency and intelligence of specifying software requirements.
翻译:目前,编写软件要求的规格无疑对整个软件寿命周期至关重要。目前,编写软件要求的规格主要取决于人的工作。虽然已提议进行大量研究,以通过提出先进的引导和分析技术来加快这一过程,但仍然是一项耗时和容易出错的任务,需要考虑到域内知识和商业信息。在本文件中,我们提议了一个名为ReqGen的方法,它可以通过根据某些给定关键词自动生成自然语言要求规格来提供建议。具体地说,ReqGen由三个关键步骤组成。首先,以关键字为导向的知识是从域内文学中挑选出来的,并被注入到基本的统一、预先训练的语言模型(UniLM)中,用于域微调。第二,将一个复制机制整合起来,以确保生成的报表中出现关键字。最后,要求语法限制解码是为了关闭候选人和参考规格之间的语义和语法距离。对来自不同组和领域的两个公共数据集的实验表明,ReqGen 将六种通用的自然语言生成方法与关键词(口令)的硬性约束性约束方法相违背,而我们认为,而ReqGEusax要求的遵守了REusax。