DocCoder: 通过检索和阅读文档生成代码 (DocCoder: Generating Code by Retrieving and Reading Docs) - 专知论文

会员服务 ·

0

代码 · 泛函 · MoDELS · 基准 · Learning ·

2022 年 7 月 13 日

DocCoder: Generating Code by Retrieving and Reading Docs

翻译：DocCoder: 通过检索和阅读文档生成代码

Shuyan Zhou,Uri Alon,Frank F. Xu,Zhengbao JIang,Graham Neubig

Natural-language-to-code models learn to generate a code snippet given a natural language (NL) intent. However, the rapid growth of both publicly available and proprietary libraries and functions makes it impossible to cover all APIs using training examples, as new libraries and functions are introduced daily. Thus, existing models inherently cannot generalize to using unseen functions and libraries merely through incorporating them into the training data. In contrast, when human programmers write programs, they frequently refer to textual resources such as code manuals, documentation, and tutorials, to explore and understand available library functionality. Inspired by this observation, we introduce DocCoder: an approach that explicitly leverages code manuals and documentation by (1) retrieving the relevant documentation given the NL intent, and (2) generating the code based on the NL intent and the retrieved documentation. Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model. We demonstrate that DocCoder consistently improves NL-to-code models: DocCoder achieves 11x higher exact match accuracy than strong baselines on a new Bash dataset tldr; on the popular Python CoNaLa benchmark, DocCoder improves over strong baselines by 1.65 BLEU.

翻译：自然语言到代码模型学会根据自然语言(NL)的意图生成代码片断。然而,由于公共和专有图书馆和功能的迅速增长,由于新的图书馆和功能每天都在引入,因此无法利用培训实例涵盖所有API,因此,现有模型本身无法仅仅通过将隐性功能和图书馆纳入培训数据而笼统地使用这些隐性功能和图书馆。相比之下,当人类程序员编写程序时,它们经常参考诸如代码手册、文档和导师等文本资源,以探索和理解可用的图书馆功能。根据这一观察,我们采用了DocCoder:一种方法,它明确利用了代码手册和文件,其方法是:(1) 根据NL的意图检索相关文件,(2) 根据NL的意图和检索的文件生成代码。我们的方法是一般性的,可以适用于任何编程语言,并且对基本神经模型具有概念性。我们证明DocCooorder不断改进NL到代码模型:DocCocoder在新的Bash数据库1.Coldr改进了CoyL基准上的11x准确性比强基线要高一些。

0

相关内容

代码（Code）是专知网的一个重要知识资料文档板块，旨在整理收录论文源代码、复现代码，经典工程代码等，便于用户查阅下载使用。

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

吡咯喹啉醌三锂调控小胶质细胞极化治疗阿尔茨海默病的机制

国家自然科学基金

0+阅读 · 2015年12月31日

miR-1靶向LXRα调控糖尿病心肌中脂质代谢的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

乳房外Paget病中错配修复基因功能异常与PI3K/AKT通路在侵袭中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Iotrochota属海绵及其内生菌中含氮化合物及其抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

肝纤维化恢复期TRAIL对星状细胞增殖的调控

国家自然科学基金

0+阅读 · 2008年12月31日

AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

Arxiv

0+阅读 · 2022年9月8日

Understanding Questions that Arise When Working with Business Documents

Arxiv

0+阅读 · 2022年9月8日

Relationship between Gender and Code Reading Speed in Software Development

Arxiv

0+阅读 · 2022年9月8日

Automatic Code Documentation Generation Using GPT-3

Arxiv

0+阅读 · 2022年9月6日

Multi-Figurative Language Generation

Arxiv

0+阅读 · 2022年9月5日

Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

Arxiv

0+阅读 · 2022年9月5日

ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining

Arxiv

0+阅读 · 2022年9月4日

Cooperative coevolutionary Modified Differential Evolution with Distance-based Selection for Large-Scale Optimization Problems through an automatic Random Grouping

Arxiv

1+阅读 · 2022年9月2日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

Arxiv

0+阅读 · 2022年9月8日

Understanding Questions that Arise When Working with Business Documents

Arxiv

0+阅读 · 2022年9月8日

Relationship between Gender and Code Reading Speed in Software Development

Arxiv

0+阅读 · 2022年9月8日

Automatic Code Documentation Generation Using GPT-3

Arxiv

0+阅读 · 2022年9月6日

Multi-Figurative Language Generation

Arxiv

0+阅读 · 2022年9月5日

Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

Arxiv

0+阅读 · 2022年9月5日

ArgLegalSumm: Improving Abstractive Summarization of Legal Documents with Argument Mining

Arxiv

0+阅读 · 2022年9月4日

Cooperative coevolutionary Modified Differential Evolution with Distance-based Selection for Large-Scale Optimization Problems through an automatic Random Grouping

Arxiv

1+阅读 · 2022年9月2日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

吡咯喹啉醌三锂调控小胶质细胞极化治疗阿尔茨海默病的机制

国家自然科学基金

0+阅读 · 2015年12月31日

miR-1靶向LXRα调控糖尿病心肌中脂质代谢的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

乳房外Paget病中错配修复基因功能异常与PI3K/AKT通路在侵袭中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Iotrochota属海绵及其内生菌中含氮化合物及其抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

肝纤维化恢复期TRAIL对星状细胞增殖的调控

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员