以多种语文对代码生成模式进行多语文评价 (Multi-lingual Evaluation of Code Generation Models)

Ben Athiwaratkun,Sanjay Krishna Gouda,Zijian Wang,Xiaopeng Li,Yuchen Tian,Ming Tan,Wasi Uddin Ahmad,Shiqi Wang,Qing Sun,Mingyue Shang,Sujan Kumar Gonugondla,Hantian Ding,Varun Kumar,Nathan Fulton,Arash Farahani,Siddhartha Jain,Robert Giaquinto,Haifeng Qian,Murali Krishna Ramanathan,Ramesh Nallapati,Baishakhi Ray,Parminder Bhatia,Sudipta Sengupta,Dan Roth,Bing Xiang

from arxiv, Code and data release: https://github.com/amazon-research/mbxp-exec-eval

We present MBXP, an execution-based code completion benchmark in 10+ programming languages. This collection of datasets is generated by our conversion framework that translates prompts and test cases from the original MBPP dataset to the corresponding data in a target language. Based on this benchmark, we are able to evaluate code generation models in a multi-lingual fashion, and in particular discover generalization ability of language models on out-of-domain languages, advantages of large multi-lingual models over mono-lingual, benefits of few-shot prompting, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages. These solutions can be used for other code-related evaluations such as insertion-based, summarization, or code translation tasks where we demonstrate results and release as part of our benchmark.

翻译：我们用10+编程语言介绍了基于执行的代码完成基准MBXP。这种数据集的收集来自我们的转换框架,将原始的MBP数据集的速率和测试案例翻译成一种目标语言的相应数据。根据这一基准,我们能够以多种语文的方式评价代码生成模型,特别是发现语言模型的外语通用能力、大型多语言模型优于单语的优势、微弱提示的好处和零光化翻译能力。此外,我们还利用我们的代码生成模型进行大规模靴式穿梭,以获得多种语言的合成罐头解决方案。这些解决方案可用于其他代码相关评估,如插入、合成或代码翻译等,在这些评估中,我们将结果和发布作为基准的一部分。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日