codeXGLUE:用于了解和生成守则的机械学习基准数据集 (CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation) - 专知论文

会员服务 ·

0

可理解性 · Machine Learning · 数据集 · 学成 · 基准 ·

2021 年 3 月 16 日

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

翻译：codeXGLUE:用于了解和生成守则的机械学习基准数据集

Shuai Lu,Daya Guo,Shuo Ren,Junjie Huang,Alexey Svyatkovskiy,Ambrosio Blanco,Colin Clement,Dawn Drain,Daxin Jiang,Duyu Tang,Ge Li,Lidong Zhou,Linjun Shou,Long Zhou,Michele Tufano,Ming Gong,Ming Zhou,Nan Duan,Neel Sundaresan,Shao Kun Deng,Shengyu Fu,Shujie Liu

from arxiv, 14 pages; Revise CodeBLEU scores for all models on text-to-code task

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

翻译：基准数据集对加速编制方案语言任务的研究具有重大影响。在本文件中,我们引入了CodXGLUE,这是一个基准数据集,用于促进机器学习研究,以促进了解和生成方案。 CodXGLUE包括一个跨14个数据集的10项任务汇编和一个模型评估和比较平台。 CodXGLUE还包含三个基准系统,包括BERT型、GPT型和Encoder-Decoder型模型,使研究人员能够方便地使用该平台。这些数据和基线的提供有助于开发和验证可用于各种方案理解和生成问题的新方法。

0

相关内容

可理解性

【大规模机器学习】综述论文，20页pdf，A Survey on Large-scale Machine

【大规模机器学习】综述论文，20页pdf，A Survey on Large-scale Machine

专知会员服务

66+阅读 · 2020年8月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

207+阅读 · 2019年9月30日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

Arxiv

3+阅读 · 2019年10月1日

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Arxiv

3+阅读 · 2019年9月26日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【大规模机器学习】综述论文，20页pdf，A Survey on Large-scale Machine

【大规模机器学习】综述论文，20页pdf，A Survey on Large-scale Machine

专知会员服务

66+阅读 · 2020年8月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

207+阅读 · 2019年9月30日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

自然语言处理 (NLP)资源大全

自然语言处理 (NLP)资源大全

机械鸡

35+阅读 · 2017年9月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Deep Learning for Hindi Text Classification: A Comparison

Arxiv

4+阅读 · 2020年1月19日

Adversarial NLI: A New Benchmark for Natural Language Understanding

Arxiv

4+阅读 · 2019年10月31日

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

Arxiv

3+阅读 · 2019年10月1日

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

Arxiv

3+阅读 · 2019年9月26日

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

Arxiv

3+阅读 · 2019年6月5日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

Multi-Task Deep Neural Networks for Natural Language Understanding

Multi-Task Deep Neural Networks for Natural Language Understanding

Arxiv

3+阅读 · 2019年1月31日

Multi-task learning to improve natural language understanding

Arxiv

4+阅读 · 2018年12月17日

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

Arxiv

7+阅读 · 2018年3月28日

Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing

Arxiv

11+阅读 · 2018年2月16日

微信扫码咨询专知VIP会员