互动数据科学笔记本中的自然语言与代码生成 (Natural Language to Code Generation in Interactive Data Science Notebooks) - 专知论文

会员服务 ·

0

INTERACT · Performer · 代码 · MoDELS · 语言模型化 ·

2022 年 12 月 19 日

Natural Language to Code Generation in Interactive Data Science Notebooks

翻译：互动数据科学笔记本中的自然语言与代码生成

Pengcheng Yin,Wen-Ding Li,Kefan Xiao,Abhishek Rao,Yeming Wen,Kensen Shi,Joshua Howland,Paige Bailey,Michele Catasta,Henryk Michalewski,Alex Polozov,Charles Sutton

from arxiv, 46 pages. 32 figures

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PaChiNCo, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanation, showing the potential to improve the diversity and explainability of model predictions.

翻译：计算笔记本,如Jupyter笔记本,是交互式计算环境,数据科学家普遍使用这种环境来完成数据重复和分析任务。为了测量对齐程序员的性能,这些对齐程序员自动合成程序,以完成用户自然语言(NL)意图赋予的任务,我们用数据科学笔记本中的熊猫数据分析框架来建立1082种代码生成问题基准ARCADE。ARCADE具有同一笔记本上多轮NL对代码问题的特点。它需要一个模型来理解丰富的多模式环境,例如现有的笔记本细胞及其执行状态以及以前的交互转换。为了在这一具有挑战性的任务上建立强有力的基线,我们为Python计算笔记本开发了62B代码语言模型(LM),它大大超越了公共代码LMs。最后,我们探索了几发快速战略,以通过一步分解和NLL解释来获取更好的代码,显示改进模型预测多样性和可解释的可能性。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Septin7活化Ca2+/CaN/NFAT2信号途径在糖尿病肾病足细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

异构信息空间中时间感知的个性化语义实体搜索关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

TRPC4在内皮祖细胞增殖分化及血管损伤修复中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘黄龙病亚洲种病原( Cadidatus Liberibacter assiaticus)重组抗体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

激光掩膜微细电解加工机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

PIG7在AML1-ETO白血病分化凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Keap1-Nrf2-ARE信号通路在花色苷诱导HO-1mRNA表达及抗氧化损伤中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

Adaptive Test Generation Using a Large Language Model

Arxiv

0+阅读 · 2023年2月20日

A Novel Collaborative Self-Supervised Learning Method for Radiomic Data

Arxiv

0+阅读 · 2023年2月20日

PopBlends: Strategies for Conceptual Blending with Large Language Models

Arxiv

0+阅读 · 2023年2月19日

On the Reliability and Explainability of Automated Code Generation Approaches

Arxiv

0+阅读 · 2023年2月19日

Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories

Arxiv

0+阅读 · 2023年2月17日

Learning Performance-Improving Code Edits

Arxiv

0+阅读 · 2023年2月16日

LEVER: Learning to Verify Language-to-Code Generation with Execution

Arxiv

0+阅读 · 2023年2月16日

Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works

Arxiv

0+阅读 · 2023年2月16日

Reproducible and Portable Big Data Analytics in the Cloud

Arxiv

0+阅读 · 2023年2月16日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

【下载】Python自然语言处理实战书籍和代码《Natural Language Processing in Action》

专知会员服务

80+阅读 · 2019年10月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Adaptive Test Generation Using a Large Language Model

Arxiv

0+阅读 · 2023年2月20日

A Novel Collaborative Self-Supervised Learning Method for Radiomic Data

Arxiv

0+阅读 · 2023年2月20日

PopBlends: Strategies for Conceptual Blending with Large Language Models

Arxiv

0+阅读 · 2023年2月19日

On the Reliability and Explainability of Automated Code Generation Approaches

Arxiv

0+阅读 · 2023年2月19日

Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories

Arxiv

0+阅读 · 2023年2月17日

Learning Performance-Improving Code Edits

Arxiv

0+阅读 · 2023年2月16日

LEVER: Learning to Verify Language-to-Code Generation with Execution

Arxiv

0+阅读 · 2023年2月16日

Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works

Arxiv

0+阅读 · 2023年2月16日

Reproducible and Portable Big Data Analytics in the Cloud

Arxiv

0+阅读 · 2023年2月16日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

Septin7活化Ca2+/CaN/NFAT2信号途径在糖尿病肾病足细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

异构信息空间中时间感知的个性化语义实体搜索关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

TRPC4在内皮祖细胞增殖分化及血管损伤修复中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

柑橘黄龙病亚洲种病原( Cadidatus Liberibacter assiaticus)重组抗体的研究

国家自然科学基金

0+阅读 · 2012年12月31日

激光掩膜微细电解加工机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

PIG7在AML1-ETO白血病分化凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Keap1-Nrf2-ARE信号通路在花色苷诱导HO-1mRNA表达及抗氧化损伤中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员