通过连接源和目标嵌入,关注神经机器翻译 (Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings)

In neural machine translation, a source sequence of words is encoded into a vector from which a target sequence is generated in the decoding phase. Differently from statistical machine translation, the associations between source words and their possible target counterparts are not explicitly stored. Source and target words are at the two ends of a long information processing procedure, mediated by hidden states at both the source encoding and the target decoding phases. This makes it possible that a source word is incorrectly translated into a target word that is not any of its admissible equivalent counterparts in the target language. In this paper, we seek to somewhat shorten the distance between source and target words in that procedure, and thus strengthen their association, by means of a method we term bridging source and target word embeddings. We experiment with three strategies: (1) a source-side bridging model, where source word embeddings are moved one step closer to the output target sequence; (2) a target-side bridging model, which explores the more relevant source word embeddings for the prediction of the target sequence; and (3) a direct bridging model, which directly connects source and target word embeddings seeking to minimize errors in the translation of ones by the others. Experiments and analysis presented in this paper demonstrate that the proposed bridging models are able to significantly improve quality of both sentence translation, in general, and alignment and translation of individual source words with target words, in particular.

翻译：在神经机翻译中,将源词序列编码成矢量,由此在解码阶段产生目标序列。与统计机翻译不同,源词与其可能的目标对应方之间的关联没有被明确储存。源词和目标字处于长期信息处理程序的两端,在源编码和目标解码阶段由隐藏的国家进行介介质。这使得有可能将源词错误地转换成目标词,而不是目标语言中任何可接受的对应方。在本文中,我们力求略微缩短该程序中源词与目标词之间的距离,从而通过我们用“连接源词”和“嵌入目标词”的方法加强它们之间的联系。我们试验了三种战略:(1) 源词连接模式,将源词嵌入到源码和目标解码阶段的一步更接近输出目标序列;(2) 目标端连接模型,探索更相关的源词嵌入目标序列的任何一个对应方词;(3) 直接连接模型,将源词和目标词嵌入该程序中的来源和目标单词之间的距离,从而通过一种方法加强它们之间的联系,即我们用“连接源”和“嵌入点字”来尽可能减少“嵌入”的翻译中,我们提出的“主判”中,通过其他实验显示“总的翻译中“的“的”的“格式”的“和”的“方向”的翻译。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind-牛津-CMU-CVPR2020】无监督文字翻译视频中的视觉基础，Visual Grounding in Video for Unsupervised Word Translation

专知会员服务

13+阅读 · 2020年3月12日

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日