Improvement of software development methodologies attracts developers to automatic Requirement Formalisation (RF) in the Requirement Engineering (RE) field. The potential advantages by applying Natural Language Processing (NLP) and Machine Learning (ML) in reducing the ambiguity and incompleteness of requirement written in natural languages is reported in different studies. The goal of this paper is to survey and classify existing work on NLP and ML for RF, identifying challenges in this domain and providing promising future research directions. To achieve this, we conducted a systematic literature review to outline the current state-of-the-art of NLP and ML techniques in RF by selecting 257 papers from common used libraries. The search result is filtered by defining inclusion and exclusion criteria and 47 relevant studies between 2012 and 2022 are selected. We found that heuristic NLP approaches are the most common NLP techniques used for automatic RF, primary operating on structured and semi-structured data. This study also revealed that Deep Learning (DL) technique are not widely used, instead classical ML techniques are predominant in the surveyed studies. More importantly, we identified the difficulty of comparing the performance of different approaches due to the lack of standard benchmark cases for RF.
翻译:软件开发方法学的改进吸引了开发人员在需求工程(RE)领域自动进行需求形式化(RF)。不同的研究报告了运用自然语言处理(NLP)和机器学习(ML)来减少用自然语言书写的需求的歧义性和不完整性的潜在优势。本文的目标是通过对NLP和ML在RF中现有工作的调查和分类,以确定这个领域中的挑战,并提供有前途的未来研究方向。为了达到这个目标,我们通过从常用的库中选取了257篇论文来进行系统文献综述。通过定义包含和排除标准来过滤搜索结果,选择了47篇来自2012年到2022年的相关研究。我们发现,基于启发式NLP方法是用于自动RF的最常见的NLP技术,主要作用于结构化和半结构化数据。本研究还揭示了深度学习(DL)技术并不广泛使用,代替它的是在调查研究中占主导地位的传统机器学习技术。更重要的是,由于缺乏RF的标准基准案例,我们确定了比较不同方法的性能的困难。