代码PAD: 以序列为基础的代码生成, 带有按下自动自定义的代码生成 (CodePAD: Sequence-based Code Generation with Pushdown Automaton)

In the process of code generation, it is essential to guarantee the generated code satisfies grammar constraints of programming language (PL). However, neglecting grammar constraints is a fatal drawback of commonly used sequence-based code generation. In this paper, we devise a pushdown automaton (PDA)-based methodology to address this problem, exploiting the principle that PL is a subset of PDA recognizable language and code accepted by PDA is grammatical. Specifically, we construct a PDA module and design an algorithm to constrain the generation of sequence-based models to ensure grammatical correctness. Guided by this methodology, we further propose CodePAD, a sequence-based code generation framework equipped with a PDA module, to integrate the deduction of PDA into deep learning. Additionally, this framework can leverage states of PDA deduction (including state representation, state prediction task, and joint prediction with state) to assist models in learning PDA deduction. To comprehensively evaluate CodePAD, we construct a PDA for Python and conduct extensive experiments on four public benchmark datasets. CodePAD can leverage existing sequence-based models, and we show that it can achieve 100\% grammatical correctness percentage on these benchmark datasets. Thus, it relatively improve 17\% CodeBLEU on CONALA, 8\% EM on DJANGO, and 15\% CodeBLEU on JUICE-10K compared to base models. In addition, our method significantly enhances pre-trained models, e.g., CodeBLEU of CodeGen-350M improvement from 3.21 to 21.54 on MBPP in zero-shot setting.

翻译：在代码生成过程中,必须保证生成的代码符合编程语言(PL)的语法限制。然而,忽略语法限制是常用序列生成代码的致命缺陷。在本文中,我们设计了一种基于自推自下自动地图(PDA)的方法来解决这个问题,利用PDA可识别的语言和代码的子集这一原则,PDA所接受的代码是语法学的。具体地说,我们建造了一个PDAU模块并设计了一个算法来限制基于序列的模型的生成,以确保语法正确性。在这种方法的指导下,我们进一步提议一个基于序列的代码生成框架,即配有PDA模块的基于序列的代码生成框架,将PDA的扣减纳入深层次学习。此外,这个框架可以利用PDADA的推算(包括州代表、州预测任务和州与州之间的联合预测)来帮助模型学习PDA的推算。为了全面评估代码,我们为Python设计了一个基于序列的模型,并在四个公共基准数据集上进行广泛的实验。 CodePDPADADDD可以利用现有的序列模型, 3-DA模型,我们在基准模型上对基准模型进行比较比较,我们用现有的模型,在基准模型上可以比现有的序列模型,我们比较,在基准模型上比较这些模型, AL-CODMCMCFMCRCRCFMCRCRCRCRCRCRCR 5,我们用比较了100。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日