CUGE: 中文理解和世代评估基准 (CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark) - 专知论文

会员服务 ·

0

可理解性 · MoDELS · 语言模型化 · Performer · 模型性能 ·

2021 年 12 月 27 日

CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark

翻译：CUGE: 中文理解和世代评估基准

Yuan Yao,Qingxiu Dong,Jian Guan,Boxi Cao,Zhengyan Zhang,Chaojun Xiao,Xiaozhi Wang,Fanchao Qi,Junwei Bao,Jinran Nie,Zheni Zeng,Yuxian Gu,Kun Zhou,Xuancheng Huang,Wenhao Li,Shuhuai Ren,Jinliang Lu,Chengqiang Xu,Huadong Wang,Guoyang Zeng,Zile Zhou,Jiajun Zhang,Juanzi Li,Minlie Huang,Rui Yan,Xiaodong He,Xiaojun Wan,Xin Zhao,Xu Sun,Yang Liu,Zhiyuan Liu,Xianpei Han,Erhong Yang,Zhifang Sui,Maosong Sun

Realizing general-purpose language intelligence has been a longstanding goal for natural language processing, where standard evaluation benchmarks play a fundamental and guiding role. We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic. To this end, we propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework. To facilitate CUGE, we provide a public leaderboard that can be customized to support flexible model judging criteria. Evaluation results on representative pre-trained language models indicate ample room for improvement towards general-purpose language intelligence. CUGE is publicly available at cuge.baai.ac.cn.

翻译：实现通用语言情报是自然语言处理的长期目标,标准评价基准在其中起着根本和指导作用。我们争辩说,对于通用语言情报评价,基准本身必须是全面和系统的。为此,我们提议CUGE,即中国语言理解和生成评价基准,其特点如下:(1) 等级基准框架,其中主要选择数据集,并以语言能力任务-数据集等级排列。(2) 多级评分战略,其中根据等级框架提供不同程度的示范业绩。我们为CUGE提供一个公共领导板,可以定制,支持灵活的示范评分标准。关于有代表性的预先培训的语言模型的评价结果表明,在通用语言情报方面有很大的改进余地。CUGE在cuge.baai.c.n公开提供。

0

相关内容

可理解性

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机变量结构的模型论

国家自然科学基金

0+阅读 · 2013年12月31日

物联网RFID安全协议设计与验证研究

国家自然科学基金

0+阅读 · 2013年12月31日

整合关键基础设施系统的应急供应链管理模型及算法

国家自然科学基金

0+阅读 · 2013年12月31日

面向CTA的计算机辅助诊断与虚拟介入治疗

国家自然科学基金

1+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

医疗服务中的资源调度与优化

国家自然科学基金

4+阅读 · 2011年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

物联网隐私保护安全关键技术研究

国家自然科学基金

9+阅读 · 2011年12月31日

复杂环境下的目标检测识别与跟踪若干关键问题研究

国家自然科学基金

7+阅读 · 2011年12月31日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Arxiv

1+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

相关基金

随机变量结构的模型论

国家自然科学基金

0+阅读 · 2013年12月31日

物联网RFID安全协议设计与验证研究

国家自然科学基金

0+阅读 · 2013年12月31日

整合关键基础设施系统的应急供应链管理模型及算法

国家自然科学基金

0+阅读 · 2013年12月31日

面向CTA的计算机辅助诊断与虚拟介入治疗

国家自然科学基金

1+阅读 · 2012年12月31日

机载阵列下视SAR高分辨率成像模型与处理方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

医疗服务中的资源调度与优化

国家自然科学基金

4+阅读 · 2011年12月31日

关于图顶点划分的 Thomassen 猜想

国家自然科学基金

0+阅读 · 2011年12月31日

物联网隐私保护安全关键技术研究

国家自然科学基金

9+阅读 · 2011年12月31日

复杂环境下的目标检测识别与跟踪若干关键问题研究

国家自然科学基金

7+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员