文字自动升级:文字分类学习构成强化政策 (Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification) - 专知论文

会员服务 ·

0

AutoAugment · 数据增强 · 文本分类 · Performer · 泛化理论 ·

2021 年 9 月 1 日

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

翻译：文字自动升级:文字分类学习构成强化政策

Shuhuai Ren,Jinchao Zhang,Lei Li,Xu Sun,Jie Zhou

from arxiv, Accepted by EMNLP 2021 main conference (Long Paper)

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations. Traditional methods first devise task-specific operations such as Synonym Substitute, then preset the corresponding parameters such as the substitution rate artificially, which require a lot of prior knowledge and are prone to fall into the sub-optimum. Besides, the number of editing operations is limited in the previous methods, which decreases the diversity of the augmented data and thus restricts the performance gain. To overcome the above limitations, we propose a framework named Text AutoAugment (TAA) to establish a compositional and learnable paradigm for data augmentation. We regard a combination of various operations as an augmentation policy and utilize an efficient Bayesian Optimization algorithm to automatically search for the best policy, which substantially improves the generalization capability of models. Experiments on six benchmark datasets show that TAA boosts classification accuracy in low-resource and class-imbalanced regimes by an average of 8.8% and 9.7%, respectively, outperforming strong baselines.

翻译：增加数据的目的是为了丰富培训样本,以缓解低资源或班级平衡情况下的过度适应问题。传统方法首先设计任务特定操作,如同义词替代,然后人为地预先设定相应的参数,如替换率,这需要许多先前的知识,容易落入次最佳状态。此外,编辑操作的数量在以前的方法中是有限的,这减少了增加的数据的多样性,从而限制了绩效收益。为了克服上述限制,我们提议了一个名为“文本自动启动”的框架,以建立一个组成和可学习的数据增强范式。我们认为,各种操作的组合是一种增强政策,并使用高效的巴耶西亚最佳化算法自动寻找最佳政策,这大大改进了模型的普及能力。对六个基准数据集的实验表明,TA提高低资源和班级平衡制度中的分类准确性,平均为8.8%和9.7%,比强基线强。

0

相关内容

AutoAugment

文本分类数据增强综述

专知会员服务

66+阅读 · 2021年7月11日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

学界 | 伯克利 AI 研究院提出新的数据增强算法，比谷歌大脑的 AutoAugment 更强！| ICML 2019

学界 | 伯克利 AI 研究院提出新的数据增强算法，比谷歌大脑的 AutoAugment 更强！| ICML 2019

AI科技评论

8+阅读 · 2019年6月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌推出新型数据增强算法：AutoAugment

谷歌推出新型数据增强算法：AutoAugment

论智

20+阅读 · 2018年6月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Learning interaction rules from multi-animal trajectories via augmented behavioral models

Arxiv

0+阅读 · 2021年10月25日

Few-shot Semantic Segmentation with Self-supervision from Pseudo-classes

Arxiv

0+阅读 · 2021年10月22日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

LearnDA: Learnable Knowledge-Guided Data Augmentation for Event Causality Identification

Arxiv

5+阅读 · 2021年6月3日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Fast AutoAugment

Fast AutoAugment

Arxiv

5+阅读 · 2019年5月1日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

VIP会员

文章信息

相关主题

相关VIP内容

文本分类数据增强综述

专知会员服务

66+阅读 · 2021年7月11日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

【论文推荐】Short Text Classiﬁcation via Term Graph 基于术语图的短文本分类

专知会员服务

20+阅读 · 2020年1月20日

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

【强化学习轻松入门】《Reinforcement Learning 101》，Shweta Bhatt

专知会员服务

50+阅读 · 2020年1月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

196+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

学界 | 伯克利 AI 研究院提出新的数据增强算法，比谷歌大脑的 AutoAugment 更强！| ICML 2019

学界 | 伯克利 AI 研究院提出新的数据增强算法，比谷歌大脑的 AutoAugment 更强！| ICML 2019

AI科技评论

8+阅读 · 2019年6月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

谷歌推出新型数据增强算法：AutoAugment

谷歌推出新型数据增强算法：AutoAugment

论智

20+阅读 · 2018年6月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

【论文推荐】最新六篇图像分割相关论文—控制、全卷积网络、子空间表示、多模态图像分割

专知

25+阅读 · 2018年4月15日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning interaction rules from multi-animal trajectories via augmented behavioral models

Arxiv

0+阅读 · 2021年10月25日

Few-shot Semantic Segmentation with Self-supervision from Pseudo-classes

Arxiv

0+阅读 · 2021年10月22日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

LearnDA: Learnable Knowledge-Guided Data Augmentation for Event Causality Identification

Arxiv

5+阅读 · 2021年6月3日

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs

Arxiv

3+阅读 · 2021年1月29日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Fast AutoAugment

Fast AutoAugment

Arxiv

5+阅读 · 2019年5月1日

Learning to Weight for Text Classification

Learning to Weight for Text Classification

Arxiv

8+阅读 · 2019年3月28日

Graph Convolutional Networks for Text Classification

Arxiv

11+阅读 · 2018年10月17日

微信扫码咨询专知VIP会员