CODEP: 通用代码生成的数学Seq2Seqeq 模型 (CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation) - 专知论文

会员服务 ·

0

seq2seq · GPL · MoDELS · 代码 · Extensibility ·

2022 年 11 月 14 日

CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation

翻译：CODEP: 通用代码生成的数学Seq2Seqeq 模型

Yihong Dong,Ge Li,Zhi Jin

General-purpose code generation (GPCG) aims to automatically convert the natural language description into source code in a general-purpose language (GPL) like Python. Intrinsically, code generation is a particular type of text generation that produces grammatically defined text, namely code. However, existing sequence-to-sequence (Seq2Seq) approaches neglect grammar rules when generating GPL code. In this paper, we make the first attempt to consider grammatical Seq2Seq (GSS) models for GPCG and propose CODEP, a GSS code generation framework equipped with a pushdown automaton (PDA) module. PDA module (PDAM) contains a PDA and an algorithm to help model generate the following prediction bounded in a valid set for each generation step, so that ensuring the grammatical correctness of generated codes. During training, CODEP additionally incorporates state representation and state prediction task, which leverages PDA states to assist CODEP in comprehending the parsing process of PDA. In inference, our method outputs codes satisfying grammatical constraints with PDAM and the joint prediction of PDA states. Furthermore, PDAM can be directly applied to Seq2Seq models, i.e., without any need for training. To evaluate the effectiveness of our proposed method, we construct the PDA for the most popular GPL Python and conduct extensive experiments on four benchmark datasets. Experimental results demonstrate the superiority of CODEP compared to the state-of-the-art approaches without pre-training, and PDAM also achieves significant improvements over the pre-trained models.

翻译：通用代码生成 (GPCG) 旨在将自然语言描述自动转换成像 Python 这样的通用语言源代码。从本质上讲,代码生成是一种特殊的文本生成类型,它产生语法定义的文本,即代码。然而,现有的序列到序列序列(Seq2Seq) 方法在生成 GPL 代码时忽略了语法规则。在本文件中,我们第一次尝试考虑GPCG 的语法Seq2Seq(GSS) 模型,并提议 CODEP(GD),这是一个配置了推降自动马通模块(PDA) 的 GDE 代码生成框架。 PDA 模块包含一个PDA 和算法,帮助模型产生以下的预测,每个生成步骤都有效,以确保生成的代码的语法正确性。在培训中,CODEPDA 额外整合了国家代表性和状态预测任务, 利用PDA 来帮助COD 广泛理解 PDA 的解算法进程。推算, 我们的方法输出到 SAD 4 数据预测算。

0

相关内容

seq2seq

seq2seq 是一个Encoder–Decoder 结构的网络，它的输入是一个序列，输出也是一个序列， Encoder 中将一个可变长度的信号序列变为固定长度的向量表达，Decoder 将这个固定长度的向量变成可变长度的目标的信号序列

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

极大倾角光纤光栅SPR的超痕量生化传感基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于I&I的非线性离散系统有限时间镇定与观测器研究

国家自然科学基金

0+阅读 · 2013年12月31日

介孔二氧化硅/石墨烯三明治层状材料与贵金属纳米簇构建多功能免疫传感器

国家自然科学基金

0+阅读 · 2012年12月31日

基于iPS细胞体外构建工程化淋巴组织的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

Zakharov型方程的若干问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性微球固定化CA酶强化IVCAP工艺捕集CO2的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

以生物质碳为原料制备石墨烯复合材料的研究

国家自然科学基金

1+阅读 · 2012年12月31日

KLF2在动脉粥样硬化炎症中的负性调控作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

富氮介孔炭材料协同低温等离子体脱除煤基气中含硫化合物的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

携Herceptin和紫杉醇纳米造影剂乳腺癌分子显影和新辅助化疗研究

国家自然科学基金

0+阅读 · 2009年12月31日

CodePAD: Sequence-based Code Generation with Pushdown Automaton

Arxiv

0+阅读 · 2023年1月9日

Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Arxiv

0+阅读 · 2023年1月5日

Neural Distributed Image Compression with Cross-Attention Feature Alignment

Arxiv

0+阅读 · 2023年1月5日

A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies

Arxiv

0+阅读 · 2023年1月5日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

CodePAD: Sequence-based Code Generation with Pushdown Automaton

Arxiv

0+阅读 · 2023年1月9日

Towards Table-to-Text Generation with Pretrained Language Model: A Table Structure Understanding and Text Deliberating Approach

Arxiv

0+阅读 · 2023年1月5日

Neural Distributed Image Compression with Cross-Attention Feature Alignment

Arxiv

0+阅读 · 2023年1月5日

A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies

Arxiv

0+阅读 · 2023年1月5日

A Systematic Survey on Deep Generative Models for Graph Generation

Arxiv

18+阅读 · 2022年10月4日

A Survey on Generative Diffusion Model

Arxiv

46+阅读 · 2022年9月6日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog

Arxiv

14+阅读 · 2020年3月10日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

极大倾角光纤光栅SPR的超痕量生化传感基础研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于I&I的非线性离散系统有限时间镇定与观测器研究

国家自然科学基金

0+阅读 · 2013年12月31日

介孔二氧化硅/石墨烯三明治层状材料与贵金属纳米簇构建多功能免疫传感器

国家自然科学基金

0+阅读 · 2012年12月31日

基于iPS细胞体外构建工程化淋巴组织的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

Zakharov型方程的若干问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

磁性微球固定化CA酶强化IVCAP工艺捕集CO2的应用基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

以生物质碳为原料制备石墨烯复合材料的研究

国家自然科学基金

1+阅读 · 2012年12月31日

KLF2在动脉粥样硬化炎症中的负性调控作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

富氮介孔炭材料协同低温等离子体脱除煤基气中含硫化合物的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

携Herceptin和紫杉醇纳米造影剂乳腺癌分子显影和新辅助化疗研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员