在多任务视图中增加文本 (Text Augmentation in a Multi-Task View) - 专知论文

会员服务 ·

0

数据增强 · 样例 · 原点 · 输入分布 · Performer ·

2021 年 1 月 14 日

Text Augmentation in a Multi-Task View

翻译：在多任务视图中增加文本

Jason Wei,Chengyu Huang,Shiqi Xu,Soroush Vosoughi

from arxiv, Accepted to EACL 2021

Traditional data augmentation aims to increase the coverage of the input distribution by generating augmented examples that strongly resemble original samples in an online fashion where augmented examples dominate training. In this paper, we propose an alternative perspective -- a multi-task view (MTV) of data augmentation -- in which the primary task trains on original examples and the auxiliary task trains on augmented examples. In MTV data augmentation, both original and augmented samples are weighted substantively during training, relaxing the constraint that augmented examples must resemble original data and thereby allowing us to apply stronger levels of augmentation. In empirical experiments using four common data augmentation techniques on three benchmark text classification datasets, we find that the MTV leads to higher and more robust performance improvements than traditional augmentation.

翻译：传统数据扩增的目的是扩大投入分布的覆盖面,方法是产生更多例子,在培训以强化实例为主的在线方式与原始样本非常相似。在本文中,我们提出了另一种观点 -- -- 数据扩增的多任务视图(MTV) -- -- 即以原始实例为主的初级任务列车和以强化示例为主的辅助任务列车。在MTV数据扩增中,原始和扩增的样本在培训过程中都进行了实质性加权,放宽了增加示例必须与原始数据相仿的制约,从而使我们能够应用更强的增强度。在使用三种基准文本分类数据集的四种共同数据扩增技术的实验中,我们发现MTV导致比传统增扩增的更高、更强大的性能改进。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

深度伪造与检测技术综述

专知会员服务

74+阅读 · 2020年12月12日

近期必读的六篇 NeurIPS 2020【域自适应】相关论文和代码

专知会员服务

42+阅读 · 2020年12月1日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CIKM2019 Tutorial】Recommendation for Multi-Stakeholders and through Neural Review Mining，附158页PDF免费下载

【CIKM2019 Tutorial】Recommendation for Multi-Stakeholders and through Neural Review Mining，附158页PDF免费下载

专知会员服务

21+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

已删除

将门创投

7+阅读 · 2019年10月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

Arxiv

0+阅读 · 2021年3月10日

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification

Arxiv

0+阅读 · 2021年3月9日

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Arxiv

0+阅读 · 2021年3月8日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Data augmentation using learned transforms for one-shot medical image segmentation

Arxiv

5+阅读 · 2019年2月25日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

Visual Question Answering with Memory-Augmented Networks

Arxiv

4+阅读 · 2018年3月25日

VIP会员

文章信息

相关主题

相关VIP内容

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

深度伪造与检测技术综述

专知会员服务

74+阅读 · 2020年12月12日

近期必读的六篇 NeurIPS 2020【域自适应】相关论文和代码

专知会员服务

42+阅读 · 2020年12月1日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【CIKM2019 Tutorial】Recommendation for Multi-Stakeholders and through Neural Review Mining，附158页PDF免费下载

【CIKM2019 Tutorial】Recommendation for Multi-Stakeholders and through Neural Review Mining，附158页PDF免费下载

专知会员服务

21+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

一文读懂最强中文NLP预训练模型ERNIE

一文读懂最强中文NLP预训练模型ERNIE

AINLP

25+阅读 · 2019年10月22日

已删除

将门创投

7+阅读 · 2019年10月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review

Arxiv

0+阅读 · 2021年3月10日

Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification

Arxiv

0+阅读 · 2021年3月9日

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing

Arxiv

0+阅读 · 2021年3月8日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Data augmentation using learned transforms for one-shot medical image segmentation

Arxiv

5+阅读 · 2019年2月25日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

Visual Question Answering with Memory-Augmented Networks

Arxiv

4+阅读 · 2018年3月25日

微信扫码咨询专知VIP会员