法典法典法典:实用代码生成基准,并配有未经培训的生成模型 (CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models)

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To validate the performance of these models, multiple existing benchmarks (e.g., AiXBench and HumanEval) are proposed, including only cases of generating a standalone function, i.e., a function that invokes or accesses only built-in functions and standard libraries. However, standalone functions constitute only about 30\% of functions from real open-source projects. To assess a model's performance for pragmatic code generation (i.e., code generation for real settings of open source or proprietary code), in this paper, we propose a benchmark named CoderEval of pragmatic code generation with generative pre-trained models. Compared with the widely-used HumanEval benchmark from OpenAI, CoderEval can be used to assess the performance of models against pragmatic code generation beyond just generating standalone functions. Through the evaluation of three public available models (CodeGen, PanGu-Coder, and Codex) on CoderEval, we analyze and discuss the current progress and future directions of pragmatic code generation with a generative pre-trained model.

翻译：以培训前和微调范式为基础的守则生成模式,学术界和工业界都越来越多地尝试采用这种模式,结果产生了众所周知的工业模式,如Codelx、CodeGen和PanGu-Coder。为了验证这些模式的性能,我们提议了多种现有基准(例如AiXucench和HumanEval),包括仅产生独立功能,即仅援引或进入内部功能和标准图书馆的功能;然而,独立功能仅构成实际开放源码项目功能的约30 ⁇ 左右。为了评估实用代码生成(即开放源码或专利码实际设置的代码生成)的性能,我们在本文件中提出了称为实用代码生成基准的CoderEval,并附有基因化的预培训模型。与OpenAI广泛使用的HumanEval基准相比,CocrEval可用于评估模型的性能,而不是仅仅产生独立功能的实用代码生成。通过对三种公共可用模型(CodeGen、PanGu-Coder-Coder)进行评估,并讨论当前实际生成指南和代码前的进度。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日