编写 Codex 提示工程用于 OCL 生成：一项实证研究 (On Codex Prompt Engineering for OCL Generation: An Empirical Study)

from arxiv, 10 pages. Full abstract in the pre-print. Accepted to be published to the 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)

The Object Constraint Language (OCL) is a declarative language that adds constraints and object query expressions to MOF models. Despite its potential to provide precision and conciseness to UML models, the unfamiliar syntax of OCL has hindered its adoption. Recent advancements in LLMs, such as GPT-3, have shown their capability in many NLP tasks, including semantic parsing and text generation. Codex, a GPT-3 descendant, has been fine-tuned on publicly available code from GitHub and can generate code in many programming languages. We investigate the reliability of OCL constraints generated by Codex from natural language specifications. To achieve this, we compiled a dataset of 15 UML models and 168 specifications and crafted a prompt template with slots to populate with UML information and the target task, using both zero- and few-shot learning methods. By measuring the syntactic validity and execution accuracy metrics of the generated OCL constraints, we found that enriching the prompts with UML information and enabling few-shot learning increases the reliability of the generated OCL constraints. Furthermore, the results reveal a close similarity based on sentence embedding between the generated OCL constraints and the human-written ones in the ground truth, implying a level of clarity and understandability in the generated OCL constraints by Codex.

翻译：摘要：Object Constraint Language（OCL）是一种声明性语言，它将约束和对象查询表达式添加到 MOF 模型中。尽管它能够为 UML 模型提供精度和简洁性，但OCL 的陌生语法阻碍了它的应用。近期在 LLMs 领域的前沿，如 GPT-3，已经展示出它们在许多自然语言处理（NLP）任务中的能力，包括语义分析和文本生成。Codex 是 GPT-3 的后代之一，已经在来自 GitHub 的公共代码上进行了微调，并能在许多编程语言中生成代码。我们研究了从自然语言规范中生成的 OCL 约束的可靠性。为此，我们收集了 15 个 UML 模型和 168 个规范的数据集，并设计了一个提示模板，用于填充 UML 信息和目标任务的插槽，使用零或小量样本学习方法。通过测量生成的 OCL 约束的语法有效性和执行精度指标，我们发现丰富提示与 UML 信息，并启用小量样本学习可以提高生成的 OCL 约束的可靠性。此外，结果显示，生成的 OCL 约束与人类编写的地面真相的句子嵌入之间存在着相似性，表明 Codex 生成的 OCL 约束具有一定的清晰度和可理解性。

相关内容

UML

关注 2

统一建模语言（UML，Unified Modeling Language）是由国际软件行业组织 OMG（对象管理集团 http://omg.org）自 1997 年起研发的用于 IT 各领域建模的一套标准、通用、图形化的面向对象（OO）建模语言，对应的国际标准为 ISO/IEC 19505。UML 具有简单、直观、形象、表达力强等特点，因此不仅常用于复杂软件系统架构的建模和面向对象分析与设计（OOAD），也可用于复杂业务流程及系统需求的建模。UML 当前的最新版本为 v2.5（2015.3）。 UML 起源于 3 位著名的软件工程方法学家 Grady Booch、James Rumbaugh、Ivar Jacobson 融合、统一了他们各自原来的建模语言和方法。

【吴恩达新课程】ChatGPT提示工程，ChatGPT Prompt Engineering for Developers

专知会员服务

104+阅读 · 2023年4月28日

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日