后译框架中两个翻译模型的端对端到端培训 (End-to-End Training of Both Translation Models in the Back-Translation Framework)

Semi-supervised learning algorithms in neural machine translation (NMT) have significantly improved translation quality compared to the supervised learning algorithms by using additional monolingual corpora. Among them, back-translation is a theoretically well-structured and cutting-edge method. Given two pre-trained NMT models between source and target languages, one translates a monolingual sentence as a latent sentence, and the other reconstructs the monolingual input sentence given the latent sentence. Therefore, previous works tried to apply the variational auto-encoder's (VAE) training framework to the back-translation framework. However, the discrete property of the latent sentence made it impossible to use backpropagation in the framework. This paper proposes a categorical reparameterization trick that generates a differentiable sentence, with which we practically implement the VAE's training framework for the back-translation and train it by end-to-end backpropagation. In addition, we propose several regularization techniques that are especially advantageous to this framework. In our experiments, we demonstrate that our method makes backpropagation available through the latent sentences and improves the BLEU scores on the datasets of the WMT18 translation task.

翻译：神经机翻译( NMT) 中半监督的学习算法与监督的学习算法相比,大大提高了翻译质量,使用了额外的单语翻译。其中包括, 后翻译是一种理论上结构完善的尖端方法。鉴于源语言和目标语言之间有两种经过预先训练的NMT模型, 一种将单语句翻译为潜在句子, 而另一种则将单语输入句重建为潜在句子。因此, 先前的工程试图将变式自动编码器( VAE) 培训框架应用到后翻译框架。但是, 潜在句子的单独属性使得无法在框架中使用反向表达。本文建议了一种绝对的重新校正法, 产生一种截然不同的句子, 我们实际上执行 VAE 的回译培训框架, 并通过端到端的反向表达法来训练它。此外, 我们提出了几种特别有利于这个框架的正规化技术。在我们的实验中, 我们的方法通过隐性句子翻译, 并改进了 BLEU 收分数任务的数据设置。

相关内容

反向传播

关注 355

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日