Software configurations play a crucial role in determining the behavior of software systems. In order to ensure safe and error-free operation, it is necessary to identify the correct configuration, along with their valid bounds and rules, which are commonly referred to as software specifications. As software systems grow in complexity and scale, the number of configurations and associated specifications required to ensure the correct operation can become large and prohibitively difficult to manipulate manually. Due to the fast pace of software development, it is often the case that correct software specifications are not thoroughly checked or validated within the software itself. Rather, they are frequently discussed and documented in a variety of external sources, including software manuals, code comments, and online discussion forums. Therefore, it is hard for the system administrator to know the correct specifications of configurations due to the lack of clarity, organization, and a centralized unified source to look at. To address this challenge, we propose SpecSyn a framework that leverages a state-of-the-art large language model to automatically synthesize software specifications from natural language sources. Our approach formulates software specification synthesis as a sequence-to-sequence learning problem and investigates the extraction of specifications from large contextual texts. This is the first work that uses a large language model for end-to-end specification synthesis from natural language texts. Empirical results demonstrate that our system outperforms prior the state-of-the-art specification synthesis tool by 21% in terms of F1 score and can find specifications from single as well as multiple sentences.
翻译:软件配置在确定软件系统行为方面起着至关重要的作用。为了确保安全和无误操作,有必要识别正确的配置,以及它们的有效范围和规则,通常称为软件规范。随着软件系统在复杂性和规模上的增长,确保正确操作所需的配置和相关规范数量可能变得很大,并且手动操作可能变得非常困难。由于软件开发的快速步伐,通常情况下,正确的软件规范并没有在软件本身中得到充分检查或验证。相反,它们通常在各种外部来源中讨论和记录,包括软件手册、代码注释和在线讨论论坛。因此,系统管理员很难知道配置的正确规范,因为缺乏清晰性、组织性和一个集中的统一来源来查看。为了应对这一挑战,我们提出了SpecSyn——一个利用最先进的大型语言模型从自然语言源自动生成软件规范的框架。我们的方法将软件规范合成形式化为一个序列到序列的学习问题,并研究从大型上下文文本中提取规范的方法。这是第一项使用大型语言模型进行自然语言文本端到端规范合成的工作。实证结果表明,我们的系统在F1分数方面优于先前的最先进规范合成工具21%,并且可以从单个或多个句子中找到规范。