守则核心:通过执行学习守则评价守则的编制 (CodeScore: Evaluating Code Generation by Learning Code Execution) - 专知论文

会员服务 ·

0

代码 · CASES · 相关系数 · Learning · 可约的 ·

2023 年 1 月 22 日

CodeScore: Evaluating Code Generation by Learning Code Execution

翻译：守则核心:通过执行学习守则评价守则的编制

Yihong Dong,Jiazheng Ding,Xue Jiang,Zhuo Li,Ge Li,Zhi Jin

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing CEMs can be categorized into match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) and execution-based CEMs (e.g., AvgPassRatio and Pass@k), but both of them suffer from some issues. The former only measures differences in surface form regardless of the functional equivalence of codes, while the latter has huge execution overheads, including collecting expensive test cases, resolving tedious execution dependencies, and enormous execution time. To address these issues, in this paper, we propose CodeScore, an efficient and effective CEM for code generation, which estimates test case PassRatio of generated code without executing code. We also present a framework named UniCE for training unified code evaluation models by learning code execution, i.e., learning PassRatio and Executability of generated code. In order to learn code execution comprehensively, we construct more than 100 test cases for each task in several popular benchmark datasets, covering MBPP, APPS, and HumanEval. Experimental results show that CodeScore has obtained a state-of-the-art correlation with execution-based CEMs. CodeScore is strongly correlated with AvgPassPatio, and binary CodeScore is moderately correlated with Pass@1. In particular, CodeScore eliminates the need for test cases and execution dependencies in inference, and CodeScore reduces execution time by three orders of magnitude compared to AvgPassPatio and Pass@1.

翻译：正确的代码评价衡量标准(CEM)深刻影响代码生成的演变,这是NLP和软件工程中的一个重要研究领域。当前的代码评价标准可以分为基于匹配的 CEM(如BLEU、Acccularacy和CobleU)和基于执行的 CEM(如AvgPassRatio和Pass@k),但两者都存在一些问题。前者仅衡量表面形式的差异,而不论代码的功能等同性,而后者则拥有巨大的执行间接费用,包括收集昂贵的测试案例、解决重复执行依赖性以及巨大的执行时间。为了解决这些问题,在本文件中,我们提议为代码生成建立一个基于匹配的 CocoS、高效有效的 CEMEM(例如,如AvgPassRatio) 和基于执行代码,我们提出了一个名为Unicole的框架,用于培训统一的代码评价模式,即基于常规执行,即学习使用Pacial-Prenti;为了全面学习代码执行,我们为每项任务建立100多个测试案例,并且通过大众基准S 测试S 测试S 测试S 测试S 测试S 测试S 显示一个数据库结果。

0

相关内容

代码（Code）是专知网的一个重要知识资料文档板块，旨在整理收录论文源代码、复现代码，经典工程代码等，便于用户查阅下载使用。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

S1PR1和VEGF受体在急性心肌梗死后血管新生中的交互作用

国家自然科学基金

0+阅读 · 2014年12月31日

单孔微创超声手术刀高频疲劳损伤微区特性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

NFATc1通过ATF3增强足细胞损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

农田氧化亚氮排放和产量对水氮输入以及气候变化的综合响应

国家自然科学基金

0+阅读 · 2012年12月31日

番茄质外体H2O2与MAPK互作在冷驯化提高交叉抗性中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

生物反应器重编程来源细胞对角膜内皮和视网膜色素上皮损伤修复的作用

国家自然科学基金

0+阅读 · 2012年12月31日

黄瓜韧皮部汁液响应盐胁迫的蛋白质组学及抗性相关基因功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

含未饱和配位金属离子材料的储氢特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions

Arxiv

1+阅读 · 2023年3月14日

From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer

Arxiv

0+阅读 · 2023年3月14日

Architext: Language-Driven Generative Architecture Design

Arxiv

0+阅读 · 2023年3月13日

Systematic Evaluation of Deep Learning Models for Failure Prediction

Arxiv

0+阅读 · 2023年3月13日

A Domain Specific Language for Testing Consensus Implementations

Arxiv

0+阅读 · 2023年3月10日

Accurate Real-time Polyp Detection in Videos from Concatenation of Latent Features Extracted from Consecutive Frames

Arxiv

0+阅读 · 2023年3月10日

Lemmas: Generation, Selection, Application

Arxiv

0+阅读 · 2023年3月10日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions

Arxiv

1+阅读 · 2023年3月14日

From Discrimination to Generation: Knowledge Graph Completion with Generative Transformer

Arxiv

0+阅读 · 2023年3月14日

Architext: Language-Driven Generative Architecture Design

Arxiv

0+阅读 · 2023年3月13日

Systematic Evaluation of Deep Learning Models for Failure Prediction

Arxiv

0+阅读 · 2023年3月13日

A Domain Specific Language for Testing Consensus Implementations

Arxiv

0+阅读 · 2023年3月10日

Accurate Real-time Polyp Detection in Videos from Concatenation of Latent Features Extracted from Consecutive Frames

Arxiv

0+阅读 · 2023年3月10日

Lemmas: Generation, Selection, Application

Arxiv

0+阅读 · 2023年3月10日

An Overview on Machine Translation Evaluation

An Overview on Machine Translation Evaluation

Arxiv

14+阅读 · 2022年2月22日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

相关基金

S1PR1和VEGF受体在急性心肌梗死后血管新生中的交互作用

国家自然科学基金

0+阅读 · 2014年12月31日

单孔微创超声手术刀高频疲劳损伤微区特性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

NFATc1通过ATF3增强足细胞损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

农田氧化亚氮排放和产量对水氮输入以及气候变化的综合响应

国家自然科学基金

0+阅读 · 2012年12月31日

番茄质外体H2O2与MAPK互作在冷驯化提高交叉抗性中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

生物反应器重编程来源细胞对角膜内皮和视网膜色素上皮损伤修复的作用

国家自然科学基金

0+阅读 · 2012年12月31日

黄瓜韧皮部汁液响应盐胁迫的蛋白质组学及抗性相关基因功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

含未饱和配位金属离子材料的储氢特性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员