RPN: 一种基于词向量的深度学习语言理解数据增强算法 (RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding) - 专知论文

会员服务 ·

0

NLU · 数据增强 · 词向量 · 语言理解 · 增强算法 ·

2023 年 4 月 13 日

RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

翻译：RPN: 一种基于词向量的深度学习语言理解数据增强算法

Zhengqing Yuan,Zhuanzhe Zhao,Yongming Liu,Xiaolong Zhang,Xuecong Hou,Yue Wang,Huiwen Xue

from arxiv, 8 pages, 3 figures

Data augmentation is a widely used technique in machine learning to improve model performance. However, existing data augmentation techniques in natural language understanding (NLU) may not fully capture the complexity of natural language variations, and they can be challenging to apply to large datasets. This paper proposes the Random Position Noise (RPN) algorithm, a novel data augmentation technique that operates at the word vector level. RPN modifies the word embeddings of the original text by introducing noise based on the existing values of selected word vectors, allowing for more fine-grained modifications and better capturing natural language variations. Unlike traditional data augmentation methods, RPN does not require gradients in the computational graph during virtual sample updates, making it simpler to apply to large datasets. Experimental results demonstrate that RPN consistently outperforms existing data augmentation techniques across various NLU tasks, including sentiment analysis, natural language inference, and paraphrase detection. Moreover, RPN performs well in low-resource settings and is applicable to any model featuring a word embeddings layer. The proposed RPN algorithm is a promising approach for enhancing NLU performance and addressing the challenges associated with traditional data augmentation techniques in large-scale NLU tasks. Our experimental results demonstrated that the RPN algorithm achieved state-of-the-art performance in all seven NLU tasks, thereby highlighting its effectiveness and potential for real-world NLU applications.

翻译：数据增强是机器学习中常用的技术，可以提高模型性能。然而，现有的自然语言理解（NLU）数据增强技术可能无法完全捕捉自然语言的复杂变化，并且难以应用于大型数据集。本文提出了一种新颖的数据增强技术——随机位置噪声（RPN）算法，该算法在词向量级别进行操作。RPN通过在所选单词向量的现有值基础上引入噪声来修改原始文本的词嵌入，从而实现更细粒度的修改，更好地捕捉自然语言变化。与传统的数据增强方法不同，RPN在虚拟样本更新期间不需要计算图中的梯度，从而使其简单应用于大型数据集。实验结果表明，RPN在各种NLU任务中均优于现有的数据增强技术，包括情感分析、自然语言推理和引语检测。此外，RPN在低资源环境下表现出色，并适用于包括词嵌入层的任何模型。所提出的RPN算法是增强NLU性能和解决传统数据增强技术在大规模NLU任务中面临的挑战的一种有前途的方法。我们的实验结果表明，在所有七个NLU任务中，RPN算法均实现了最先进的性能，从而突显了其有效性和进行现实world NLU应用的潜力。

0

相关内容

NLU

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【NeurIPS 2021 】 K-Net-大统一图像分割任务：语义、实例乃至全景分割

【NeurIPS 2021 】 K-Net-大统一图像分割任务：语义、实例乃至全景分割

专知会员服务

21+阅读 · 2021年12月14日

NeurIPS 2021丨K-Net: 迈向统一的图像分割

NeurIPS 2021丨K-Net: 迈向统一的图像分割

专知会员服务

17+阅读 · 2021年11月25日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

3分钟看懂史上最强NLP模型BERT

3分钟看懂史上最强NLP模型BERT

新智元

23+阅读 · 2019年2月27日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

基于深度学习的音乐特征学习与分类

国家自然科学基金

7+阅读 · 2014年12月31日

基于深度学习的机器译文质量估计方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于深度置信网络的图像分类方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于Universum学习的降维方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率遥感图像高精度快速配准技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于蚁群算法面向对象的遥感图像分类方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向大规模三维模型集的交互式共分割及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于类别结构信息和结构化学习的维数约简

国家自然科学基金

0+阅读 · 2011年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

跨语言文本自动分类关键技术研究

国家自然科学基金

2+阅读 · 2008年12月31日

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Arxiv

0+阅读 · 2023年5月29日

Whitening-based Contrastive Learning of Sentence Embeddings

Arxiv

0+阅读 · 2023年5月28日

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Arxiv

0+阅读 · 2023年5月26日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2022】基于知识蒸馏的高效预训练

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【NeurIPS 2021 】 K-Net-大统一图像分割任务：语义、实例乃至全景分割

【NeurIPS 2021 】 K-Net-大统一图像分割任务：语义、实例乃至全景分割

专知会员服务

21+阅读 · 2021年12月14日

NeurIPS 2021丨K-Net: 迈向统一的图像分割

NeurIPS 2021丨K-Net: 迈向统一的图像分割

专知会员服务

17+阅读 · 2021年11月25日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

《商用大语言模型的升级风险管理：国家安全运用》

自主人工智能：未来战争是否将是自主化的？

《从装备到文化：美陆军技术素养建设启示录》最新报告

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

3分钟看懂史上最强NLP模型BERT

3分钟看懂史上最强NLP模型BERT

新智元

23+阅读 · 2019年2月27日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning

Arxiv

0+阅读 · 2023年5月29日

Whitening-based Contrastive Learning of Sentence Embeddings

Arxiv

0+阅读 · 2023年5月28日

Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks

Arxiv

0+阅读 · 2023年5月26日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

基于深度学习的音乐特征学习与分类

国家自然科学基金

7+阅读 · 2014年12月31日

基于深度学习的机器译文质量估计方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于深度置信网络的图像分类方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于Universum学习的降维方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率遥感图像高精度快速配准技术研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于蚁群算法面向对象的遥感图像分类方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

面向大规模三维模型集的交互式共分割及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于类别结构信息和结构化学习的维数约简

国家自然科学基金

0+阅读 · 2011年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

跨语言文本自动分类关键技术研究

国家自然科学基金

2+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员