Using pre-trained word embeddings as input layer is a common practice in many natural language processing (NLP) tasks, but it is largely neglected for neural machine translation (NMT). In this paper, we conducted a systematic analysis on the effect of using pre-trained source-side monolingual word embedding in NMT. We compared several strategies, such as fixing or updating the embeddings during NMT training on varying amounts of data, and we also proposed a novel strategy called dual-embedding that blends the fixing and updating strategies. Our results suggest that pre-trained embeddings can be helpful if properly incorporated into NMT, especially when parallel data is limited or additional in-domain monolingual data is readily available.
翻译:将预先培训的嵌入字词作为输入层是许多自然语言处理(NLP)任务的一个常见做法,但在神经机翻译(NMT)中却基本上被忽视。 在本文中,我们系统分析了使用经过培训的源端单语词嵌入NMT的影响。 我们比较了几项战略,例如,在NMT培训期间固定或更新关于不同数量数据的嵌入字词,我们还提出了一项称为双层组合的新战略,将确定和更新战略混合在一起。 我们的结果表明,如果将预先培训的嵌入词适当纳入NMT, 特别是当平行数据有限或可以随时获得其他日常的单语数据时, 会有帮助。