低资源翻译集中化数据增加:神秘和解决办法 (Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution) - 专知论文

会员服务 ·

0

数据增强 · 连结 · 分解的 · 多样性 · BLEU ·

2021 年 7 月 2 日

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

翻译：低资源翻译集中化数据增加:神秘和解决办法

Toan Q. Nguyen,Kenton Murray,David Chiang

from arxiv, Accepted at IWSLT 2021

In this paper, we investigate the driving factors behind concatenation, a simple but effective data augmentation method for low-resource neural machine translation. Our experiments suggest that discourse context is unlikely the cause for the improvement of about +1 BLEU across four language pairs. Instead, we demonstrate that the improvement comes from three other factors unrelated to discourse: context diversity, length diversity, and (to a lesser extent) position shifting.

翻译：在本文中,我们调查了连接背后的驱动因素 — — 一种简单而有效的低资源神经机器翻译数据增强方法。我们的实验表明,讨论环境不太可能导致四对语言的BLEU+1的改善。相反,我们证明这一改善来自与交谈无关的另外三个因素:背景多样性、长度多样性和(在较小程度上)位置变化。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【TED】什么让我们生病

【TED】什么让我们生病

英语演讲视频每日一推

7+阅读 · 2019年1月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Arxiv

0+阅读 · 2021年9月6日

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

Arxiv

0+阅读 · 2021年9月4日

Gradual Fine-Tuning for Low-Resource Domain Adaptation

Arxiv

0+阅读 · 2021年9月1日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge

Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge

Arxiv

4+阅读 · 2019年5月6日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

相关VIP内容

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

【VLDB2019 tutorial】Combating Fake News: A Data Management and Mining Perspective，不列颠哥伦比亚大|Laks V.S. Lakshmanan，Michael Simpson，Sara Thirumuruganathan，156页PDF

专知会员服务

13+阅读 · 2019年8月27日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025】基于奖励引导解码的多模态大语言模型控制

【CMU博士论文】基于深度学习的高效贝叶斯实验设计

《数据安全国家标准体系（2025版）》征求意见稿

2025年中国AI算力基础设施发展趋势洞察

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【TED】什么让我们生病

【TED】什么让我们生病

英语演讲视频每日一推

7+阅读 · 2019年1月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Arxiv

0+阅读 · 2021年9月6日

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

Arxiv

0+阅读 · 2021年9月4日

Gradual Fine-Tuning for Low-Resource Domain Adaptation

Arxiv

0+阅读 · 2021年9月1日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Does Data Augmentation Benefit from Split BatchNorms

Does Data Augmentation Benefit from Split BatchNorms

Arxiv

3+阅读 · 2020年10月15日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge

Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge

Arxiv

4+阅读 · 2019年5月6日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员