神经机器翻译的可控双重扭曲差异损失 (Controllable Dual Skew Divergence Loss for Neural Machine Translation) - 专知论文

会员服务 ·

0

散度 · Performer · Machine Translation · 损失 · Weight ·

2021 年 4 月 17 日

Controllable Dual Skew Divergence Loss for Neural Machine Translation

翻译：神经机器翻译的可控双重扭曲差异损失

Zuchao Li,Hai Zhao,Yingting Wu,Fengshun Xiao,Shu Jiang

In sequence prediction tasks like neural machine translation, training with cross-entropy loss often leads to models that overgeneralize and plunge into local optima. In this paper, we propose an extended loss function called \emph{dual skew divergence} (DSD) that integrates two symmetric terms on KL divergences with a balanced weight. We empirically discovered that such a balanced weight plays a crucial role in applying the proposed DSD loss into deep models. Thus we eventually develop a controllable DSD loss for general-purpose scenarios. Our experiments indicate that switching to the DSD loss after the convergence of ML training helps models escape local optima and stimulates stable performance improvements. Our evaluations on the WMT 2014 English-German and English-French translation tasks demonstrate that the proposed loss as a general and convenient mean for NMT training indeed brings performance improvement in comparison to strong baselines.

翻译：在神经机翻译等序列预测任务中,关于跨孔径损失的培训往往导致过度概括和跳入本地opima的模型。在本文中,我们提议一个称为\emph{doal skew difference}(DSD)的延长损失函数,将KL差异的两个对称术语与平衡重量结合起来。我们从经验中发现,这种平衡加权对于将拟议的DSD损失应用到深层模型中起着关键作用。因此,我们最终为通用情景开发了可控的DSD损失。我们的实验表明,在ML培训趋同后转至DSD损失有助于模型摆脱本地opima,刺激稳定的性能改进。我们对2014 WMT 英文、德文和英文-法文翻译任务的评估表明,拟议的损失作为NMT培训的一般和方便手段,确实比强的基线提高了绩效。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

最新《机器翻译》进展报告，纽约大学Kyunghyun Cho讲解，附50页ppt

专知会员服务

30+阅读 · 2021年1月25日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

37+阅读 · 2019年12月4日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Sockeye: A Toolkit for Neural Machine Translation

Arxiv

7+阅读 · 2018年6月1日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Inducing Grammars with and for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月28日

A Stochastic Decoder for Neural Machine Translation

Arxiv

5+阅读 · 2018年5月28日

Sparse and Constrained Attention for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月21日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Variational Recurrent Neural Machine Translation

Arxiv

5+阅读 · 2018年1月16日

Neural Machine Translation by Jointly Learning to Align and Translate

Arxiv

3+阅读 · 2016年5月19日

VIP会员

文章信息

相关主题

Machine Translation

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

最新《机器翻译》进展报告，纽约大学Kyunghyun Cho讲解，附50页ppt

专知会员服务

30+阅读 · 2021年1月25日

威斯康辛大学《机器学习导论》2020秋季课程完结，课件、视频资源已开放

专知会员服务

16+阅读 · 2020年12月25日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

37+阅读 · 2019年12月4日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Sockeye: A Toolkit for Neural Machine Translation

Arxiv

7+阅读 · 2018年6月1日

Scaling Neural Machine Translation

Arxiv

3+阅读 · 2018年6月1日

Inducing Grammars with and for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月28日

A Stochastic Decoder for Neural Machine Translation

Arxiv

5+阅读 · 2018年5月28日

Sparse and Constrained Attention for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月21日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

Variational Recurrent Neural Machine Translation

Arxiv

5+阅读 · 2018年1月16日

Neural Machine Translation by Jointly Learning to Align and Translate

Arxiv

3+阅读 · 2016年5月19日

微信扫码咨询专知VIP会员