DataCLUUE: 以数据为核心的NLP基准套件 (DataCLUE: A Benchmark Suite for Data-centric NLP) - 专知论文

会员服务 ·

0

NLP · 宏F1 · INFORMS · Better · 自助法/自举法 ·

2021 年 11 月 16 日

DataCLUE: A Benchmark Suite for Data-centric NLP

翻译：DataCLUUE: 以数据为核心的NLP基准套件

Liang Xu,Jiacheng Liu,Xiang Pan,Xiaojing Lu,Xiaofeng Hou

from arxiv, Working In Progress. 8 pages, 9 tables, 6 figures

Data-centric AI has recently proven to be more effective and high-performance, while traditional model-centric AI delivers fewer and fewer benefits. It emphasizes improving the quality of datasets to achieve better model performance. This field has significant potential because of its great practicability and getting more and more attention. However, we have not seen significant research progress in this field, especially in NLP. We propose DataCLUE, which is the first Data-Centric benchmark applied in NLP field. We also provide three simple but effective baselines to foster research in this field (improve Macro-F1 up to 5.7% point). In addition, we conduct comprehensive experiments with human annotators and show the hardness of DataCLUE. We also try an advanced method: the forgetting informed bootstrapping label correction method. All the resources related to DataCLUE, including dataset, toolkit, leaderboard, and baselines, is available online at https://github.com/CLUEbenchmark/DataCLUE

翻译：以数据为中心的AI最近证明是更加有效和高性能的,而传统的以模式为中心的AI提供的效益越来越少,也越来越少。它强调提高数据集的质量,以取得更好的模型性能。这个领域具有巨大的潜力,因为它非常实用,而且越来越受到更多的关注。然而,我们没有看到这一领域的重大研究进展,特别是在NLP。我们提议DataCLUE,这是在NLP领域应用的第一个数据中心基准。我们还提供了三个简单而有效的基准,以促进这一领域的研究(在5.7%点至5.7%点之间推广宏观-F1)。此外,我们与人类计票员进行了全面实验,并展示了数据CLUE的硬性。我们还尝试了一种先进的方法:忘记知情靴式标签校正方法。所有与数据CLUE有关的资源,包括数据集、工具包、领导板和基线,都可以在https://github.com/CLUEbenchmark/DataCLUE上查阅。

0

相关内容

NLP

NLP:自然语言处理

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

已删除

将门创投

5+阅读 · 2017年11月22日

Educational Timetabling: Problems, Benchmarks, and State-of-the-Art Results

Arxiv

0+阅读 · 2022年1月19日

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Arxiv

0+阅读 · 2022年1月15日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Asymmetrical Vertical Federated Learning

Asymmetrical Vertical Federated Learning

Arxiv

3+阅读 · 2020年6月11日

Simple and effective localized attribute representations for zero-shot learning

Simple and effective localized attribute representations for zero-shot learning

Arxiv

5+阅读 · 2020年6月10日

Adversarial Metric Attack for Person Re-identification

Adversarial Metric Attack for Person Re-identification

Arxiv

3+阅读 · 2019年1月30日

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Arxiv

3+阅读 · 2018年6月22日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

VIP会员

文章信息

相关主题

自助法/自举法

相关VIP内容

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Yoshua Bengio，使算法知道“为什么”

Yoshua Bengio，使算法知道“为什么”

专知会员服务

8+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

五个精彩实用的自然语言处理资源

五个精彩实用的自然语言处理资源

机器学习研究会

6+阅读 · 2018年2月23日

已删除

将门创投

5+阅读 · 2017年11月22日

相关论文

Educational Timetabling: Problems, Benchmarks, and State-of-the-Art Results

Arxiv

0+阅读 · 2022年1月19日

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Arxiv

0+阅读 · 2022年1月15日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Asymmetric Loss For Multi-Label Classification

Arxiv

6+阅读 · 2020年9月29日

Asymmetrical Vertical Federated Learning

Asymmetrical Vertical Federated Learning

Arxiv

3+阅读 · 2020年6月11日

Simple and effective localized attribute representations for zero-shot learning

Simple and effective localized attribute representations for zero-shot learning

Arxiv

5+阅读 · 2020年6月10日

Adversarial Metric Attack for Person Re-identification

Adversarial Metric Attack for Person Re-identification

Arxiv

3+阅读 · 2019年1月30日

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Arxiv

3+阅读 · 2018年6月22日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Arxiv

5+阅读 · 2018年2月14日

微信扫码咨询专知VIP会员