The Object Constraint Language (OCL) is a declarative language that adds constraints and object query expressions to MOF models. Despite its potential to provide precision and conciseness to UML models, the unfamiliar syntax of OCL has hindered its adoption. Recent advancements in LLMs, such as GPT-3, have shown their capability in many NLP tasks, including semantic parsing and text generation. Codex, a GPT-3 descendant, has been fine-tuned on publicly available code from GitHub and can generate code in many programming languages. We investigate the reliability of OCL constraints generated by Codex from natural language specifications. To achieve this, we compiled a dataset of 15 UML models and 168 specifications and crafted a prompt template with slots to populate with UML information and the target task, using both zero- and few-shot learning methods. By measuring the syntactic validity and execution accuracy metrics of the generated OCL constraints, we found that enriching the prompts with UML information and enabling few-shot learning increases the reliability of the generated OCL constraints. Furthermore, the results reveal a close similarity based on sentence embedding between the generated OCL constraints and the human-written ones in the ground truth, implying a level of clarity and understandability in the generated OCL constraints by Codex.
翻译:摘要:Object Constraint Language(OCL)是一种声明性语言,它将约束和对象查询表达式添加到 MOF 模型中。尽管它能够为 UML 模型提供精度和简洁性,但OCL 的陌生语法阻碍了它的应用。近期在 LLMs 领域的前沿,如 GPT-3,已经展示出它们在许多自然语言处理(NLP)任务中的能力,包括语义分析和文本生成。Codex 是 GPT-3 的后代之一,已经在来自 GitHub 的公共代码上进行了微调,并能在许多编程语言中生成代码。我们研究了从自然语言规范中生成的 OCL 约束的可靠性。为此,我们收集了 15 个 UML 模型和 168 个规范的数据集,并设计了一个提示模板,用于填充 UML 信息和目标任务的插槽,使用零或小量样本学习方法。通过测量生成的 OCL 约束的语法有效性和执行精度指标,我们发现丰富提示与 UML 信息,并启用小量样本学习可以提高生成的 OCL 约束的可靠性。此外,结果显示,生成的 OCL 约束与人类编写的地面真相的句子嵌入之间存在着相似性,表明 Codex 生成的 OCL 约束具有一定的清晰度和可理解性。