自然语言处理中的数据扩充方法:调查 (Data Augmentation Approaches in Natural Language Processing: A Survey) - 专知论文

会员服务 ·

0

数据增强 · Processing（编程语言） · 多样性 · Learning · 测试数据 ·

2022 年 6 月 27 日

Data Augmentation Approaches in Natural Language Processing: A Survey

翻译：自然语言处理中的数据扩充方法:调查

Bohan Li,Yutai Hou,Wanxiang Che

from arxiv, accepted by AI Open (2022) at https://www.sciencedirect.com/science/article/pii/S2666651022000080 ; 55 pages, 12 figures, 3 tables

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges. Some helpful resources are provided in the appendix.

翻译：作为一项有效的战略,数据增强(DA)可以缓解深层学习技术可能失败的数据稀缺情况,在计算机视野中广泛应用,然后引入自然语言处理,并改进许多任务,DA方法的主要重点之一是改进培训数据的多样性,从而帮助模型更好地概括到看不见的测试数据,在这项调查中,我们将DA方法分为三类,其依据是扩大数据的多样性,包括分解、点名和抽样。我们的文件根据上述类别详细分析DA方法。此外,我们还将在NLP任务中应用这些方法以及挑战。一些有用的资源在附录中提供。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

大型风电机组实时可靠性评估与预防维护策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

TIM-3 促上皮-间质转化诱导骨肉瘤侵袭及转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cullin1通过调控TIMP-2泛素化影响乳腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

低温流体蒸发-中间介质冷凝传热模型与过程强化研究

国家自然科学基金

0+阅读 · 2013年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

PTPN18的底物选择性和受调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

合成气转化反应中产物选择性调控问题的研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月16日

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年8月12日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

24+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

A Survey of the Usages of Deep Learning in Natural Language Processing

A Survey of the Usages of Deep Learning in Natural Language Processing

Arxiv

122+阅读 · 2019年9月11日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

深度学习自然语言处理综述论文，Natural Language Processing Advancements By Deep Learning: A Survey

专知会员服务

80+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月16日

Learning to Prompt for Vision-Language Models

Arxiv

0+阅读 · 2022年8月12日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

A Survey of Natural Language Generation

Arxiv

15+阅读 · 2021年12月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

24+阅读 · 2021年8月12日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

A Survey of the Usages of Deep Learning in Natural Language Processing

A Survey of the Usages of Deep Learning in Natural Language Processing

Arxiv

122+阅读 · 2019年9月11日

相关基金

大型风电机组实时可靠性评估与预防维护策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

TIM-3 促上皮-间质转化诱导骨肉瘤侵袭及转移的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cullin1通过调控TIMP-2泛素化影响乳腺癌转移的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

低温流体蒸发-中间介质冷凝传热模型与过程强化研究

国家自然科学基金

0+阅读 · 2013年12月31日

液相费托合成反应的选择性调控新策略

国家自然科学基金

0+阅读 · 2012年12月31日

PTPN18的底物选择性和受调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

合成气转化反应中产物选择性调控问题的研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员