按代表制和逐步正规化改进多语种翻译 (Improving Multilingual Translation by Representation and Gradient Regularization) - 专知论文

会员服务 ·

0

正则化项 · NMT · MoDELS · 可约的 · 有向 ·

2022 年 1 月 18 日

Improving Multilingual Translation by Representation and Gradient Regularization

翻译：按代表制和逐步正规化改进多语种翻译

Yilin Yang,Akiko Eriguchi,Alexandre Muzio,Prasad Tadepalli,Stefan Lee,Hany Hassan

from arxiv, EMNLP 2021 (Oral). Code and data: https://github.com/yilinyang7/fairseq_multi_fix

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations -- commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.

翻译：多语言神经机器翻译(NMT)使一个模型能够为所有翻译方向服务,包括培训期间看不见的翻译,即零光翻译。尽管在理论上具有吸引力,但目前的模型往往产生低质量翻译,通常甚至不能以正确的目标语言产生产出。在这项工作中,我们注意到,即使在强大的多语种系统中,在大规模多语种公司培训后,离目标翻译也占主导地位。为解决这一问题,我们提议了一种在代表级别和梯度一级规范NMT模式的联合方法。在代表级别,我们利用一个辅助目标语言预测任务来规范脱coder输出以保留目标语言的信息。在梯度一级,我们利用少量的直接数据(千对词配对)来规范模型梯度。我们的结果表明,我们的方法非常有效,不仅减少了离目标翻译发生的情况,而且分别以+5.59和+10.38 BLEU在WMT和OPUS数据集上提高了零点翻译的性能。此外,实验表明,当少量的直接数据没有可用时,我们的方法也非常有效。

0

相关内容

正则化项

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于统一结构场模型的警务视频分析研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

复杂环境下交通视频分析的若干关键技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

分数阶变分PDE图像复原关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

Landau-Brazovsky模型约束最优问题

国家自然科学基金

0+阅读 · 2011年12月31日

基于图的统计机器翻译方法研究

国家自然科学基金

2+阅读 · 2010年12月31日

复杂疾病中的若干统计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

You Are What You Write: Preserving Privacy in the Era of Large Language Models

You Are What You Write: Preserving Privacy in the Era of Large Language Models

Arxiv

0+阅读 · 2022年4月20日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

Arxiv

0+阅读 · 2022年4月18日

Improving both domain robustness and domain adaptability in machine translation

Arxiv

0+阅读 · 2022年4月17日

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Arxiv

1+阅读 · 2022年4月15日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Arxiv

0+阅读 · 2022年4月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

You Are What You Write: Preserving Privacy in the Era of Large Language Models

You Are What You Write: Preserving Privacy in the Era of Large Language Models

Arxiv

0+阅读 · 2022年4月20日

Point-Level Region Contrast for Object Detection Pre-Training

Arxiv

1+阅读 · 2022年4月19日

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

Arxiv

0+阅读 · 2022年4月18日

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

Arxiv

0+阅读 · 2022年4月18日

Improving both domain robustness and domain adaptability in machine translation

Arxiv

0+阅读 · 2022年4月17日

Just Fine-tune Twice: Selective Differential Privacy for Large Language Models

Arxiv

1+阅读 · 2022年4月15日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Arxiv

0+阅读 · 2022年4月15日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

相关基金

基于统一结构场模型的警务视频分析研究

国家自然科学基金

0+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

复杂环境下交通视频分析的若干关键技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

分数阶变分PDE图像复原关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

基于肺结节多正交位CT图像Curvelet纹理构建 Gradient Boosting 集成预测模型

国家自然科学基金

0+阅读 · 2011年12月31日

Landau-Brazovsky模型约束最优问题

国家自然科学基金

0+阅读 · 2011年12月31日

基于图的统计机器翻译方法研究

国家自然科学基金

2+阅读 · 2010年12月31日

复杂疾病中的若干统计方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

组合导航系统中基于混沌、小波和神经网络的信息融合方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员