CDCRL:通过预培训模式和深强化学习掌握代码生成 (CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning) - 专知论文

会员服务 ·

0

评论员 · MoDELS · Learning · 代码 · 深度强化学习 ·

2022 年 11 月 3 日

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

翻译：CDCRL:通过预培训模式和深强化学习掌握代码生成

Hung Le,Yue Wang,Akhilesh Deepak Gotmare,Silvio Savarese,Steven C. H. Hoi

from arxiv, An earlier version of the work was accepted to NeurIPS 2022

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark.

翻译：程序合成或代码生成旨在生成一个符合问题规格的方案。最近使用大规模预先培训的语言模型(LMS)的方法显示了有希望的结果,但还是有一些关键的局限性。特别是,它们往往遵循标准监督的微调程序,仅从一对自然语言问题描述和地面真相方案来培训代码生成模型。这种模式基本上忽视了问题规格中一些重要但可能有用的信号,如单位测试,因此在解决复杂的无形编码任务时往往导致业绩不佳。为了解决这些局限性,我们建议“CoderRL”,通过预先培训LMS和深层强化学习(RL),为方案合成任务建立一个新的框架。具体地说,在培训过程中,我们把代码生成LMM作为一个行为者网络,引入一个经过培训的批评网络,以预测生成的方案的功能正确性,并向行为者提供密集的反馈信号。在推断中,我们引入了一种新的生成程序,其关键抽样战略允许基于单位测试和批评得分反馈的自动再生化程序。对于模型的骨干,我们还在模型骨干中,我们将创建了更具有挑战性的LMBSB5的基级基准,我们还在学习了更具有挑战性的目标上,我们的数据基准结构结构。

0

相关内容

评论员

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于多源衰减行波的煤矿电网故障定位方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA uc002bbp.2在 NSCLC顺铂耐药中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

STAT3调控FOXL2的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

RnCoX3n+2(R=Y,Sc,Zr,Hf,Sm Pr,Ce等,n=1,2,∞,X=Ga,In)化合物中的新超导体探索

国家自然科学基金

0+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

靶向LMP1干扰通过PI3K/Akt/mTOR通路逆转鼻咽癌细胞的TRAIL抵抗

国家自然科学基金

0+阅读 · 2013年12月31日

基于尖端放电及边缘效应的微梳状驱动器复杂动力学问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

多腔钢管混凝土异形截面巨型柱框架结构抗震机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

肿瘤细胞中TDAG8受体诱导乳酸转运体的信号通路及其功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

翻译调控肿瘤蛋白（TCTP）的高表达与结外鼻型NK/T细胞淋巴瘤细胞株TRAIL耐受的关系及可能的调控机制探讨

国家自然科学基金

0+阅读 · 2009年12月31日

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Arxiv

0+阅读 · 2022年12月22日

Lifelong Reinforcement Learning with Modulating Masks

Arxiv

0+阅读 · 2022年12月21日

On Reinforcement Learning for the Game of 2048

Arxiv

0+阅读 · 2022年12月21日

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年12月21日

Variational Inference for Model-Free and Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

深度强化学习

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

地下战：地下空间的战略博弈

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Decoding surface codes with deep reinforcement learning and probabilistic policy reuse

Arxiv

0+阅读 · 2022年12月22日

Lifelong Reinforcement Learning with Modulating Masks

Arxiv

0+阅读 · 2022年12月21日

On Reinforcement Learning for the Game of 2048

Arxiv

0+阅读 · 2022年12月21日

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年12月21日

Variational Inference for Model-Free and Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2022年12月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

相关基金

基于多源衰减行波的煤矿电网故障定位方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA uc002bbp.2在 NSCLC顺铂耐药中的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

STAT3调控FOXL2的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

RnCoX3n+2(R=Y,Sc,Zr,Hf,Sm Pr,Ce等,n=1,2,∞,X=Ga,In)化合物中的新超导体探索

国家自然科学基金

0+阅读 · 2014年12月31日

机器翻译中大规模异类特征的迁移学习

国家自然科学基金

2+阅读 · 2013年12月31日

靶向LMP1干扰通过PI3K/Akt/mTOR通路逆转鼻咽癌细胞的TRAIL抵抗

国家自然科学基金

0+阅读 · 2013年12月31日

基于尖端放电及边缘效应的微梳状驱动器复杂动力学问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

多腔钢管混凝土异形截面巨型柱框架结构抗震机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

肿瘤细胞中TDAG8受体诱导乳酸转运体的信号通路及其功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

翻译调控肿瘤蛋白（TCTP）的高表达与结外鼻型NK/T细胞淋巴瘤细胞株TRAIL耐受的关系及可能的调控机制探讨

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员