从不平均培训数据中学习:未贴标签、单标签和多标签 (Learning from Uneven Training Data: Unlabeled, Single Label, and Multiple Labels) - 专知论文

会员服务 ·

0

标注 · 样例 · 训练数据 · 学成 · 未标记 ·

2021 年 9 月 9 日

Learning from Uneven Training Data: Unlabeled, Single Label, and Multiple Labels

翻译：从不平均培训数据中学习:未贴标签、单标签和多标签

Shujian Zhang,Chengyue Gong,Eunsol Choi

from arxiv, EMNLP 2021; Our code is publicly available at https://github.com/szhang42/Uneven_training_data

Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the spectrum of language interpretation. We explore new label annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task, even when we simply first train with a single label data and then fine tune with multi label examples. Extending a MixUp data augmentation framework, we propose a learning algorithm that can learn from uneven training examples (with zero, one, or multiple labels). This algorithm efficiently combines signals from uneven training data and brings additional gains in low annotation budget and cross domain settings. Together, our method achieves consistent gains in both accuracy and label distribution metrics in two tasks, suggesting training with uneven training data can be beneficial for many NLP tasks.

翻译：培训NLP系统通常假定获得带有单个人类标签的附加说明数据。鉴于标记不完善,语言本身含混不清,我们假设单标签不足以学习语言翻译的广度。我们探索新的标签批注分配办法,为一小组培训实例提供多标签。采用这种多标签示例,以较少实例说明为代价,在自然语言推论任务和实体打字任务上取得了明显收益,即使我们只是先用单一标签数据进行训练,然后用多标签实例进行微调。扩展混集数据增强框架,我们建议一种学习算法,从不均衡的培训实例(零、一或多个标签)中学习。这种算法高效地结合了不均衡的培训数据信号,在低批注预算和跨域设置方面带来额外收益。我们的方法在两种任务中实现了准确性和标签分发指标的一致收益,建议培训数据不均有利于许多NLP任务。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

黑龙江大学自然语言处理实验室

28+阅读 · 2019年4月1日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

论文浅尝 | Learning with Noise: Supervised Relation Extraction

论文浅尝 | Learning with Noise: Supervised Relation Extraction

开放知识图谱

3+阅读 · 2018年1月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

MetaICL: Learning to Learn In Context

Arxiv

0+阅读 · 2021年10月29日

Novel View Synthesis from a Single Image via Unsupervised learning

Arxiv

1+阅读 · 2021年10月29日

CLLD: Contrastive Learning with Label Distance for Text Classificatioin

CLLD: Contrastive Learning with Label Distance for Text Classificatioin

Arxiv

0+阅读 · 2021年10月28日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

【图解自监督学习】《The Illustrated Self-Supervised Learning》by Amit Chaudhary

专知会员服务

43+阅读 · 2020年2月25日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

AI智能体时代中的记忆：形式、功能与动态综述

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

多模态大语言模型下游调优中“保持自我”的重要性

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

小样本学习（Few-shot Learning）综述

小样本学习（Few-shot Learning）综述

黑龙江大学自然语言处理实验室

28+阅读 · 2019年4月1日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

【TED】生命中的每一年的智慧

【TED】生命中的每一年的智慧

英语演讲视频每日一推

10+阅读 · 2019年1月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

论文浅尝 | Learning with Noise: Supervised Relation Extraction

论文浅尝 | Learning with Noise: Supervised Relation Extraction

开放知识图谱

3+阅读 · 2018年1月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

MetaICL: Learning to Learn In Context

Arxiv

0+阅读 · 2021年10月29日

Novel View Synthesis from a Single Image via Unsupervised learning

Arxiv

1+阅读 · 2021年10月29日

CLLD: Contrastive Learning with Label Distance for Text Classificatioin

CLLD: Contrastive Learning with Label Distance for Text Classificatioin

Arxiv

0+阅读 · 2021年10月28日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Low-Shot Learning from Imaginary Data

Arxiv

15+阅读 · 2018年4月3日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员