BERT, mBERT, 或BBERT?关于神经机器翻译的内嵌背景研究 (BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation)

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation (NMT) systems. However, proposed methods for incorporating pre-trained models are non-trivial and mainly focus on BERT, which lacks a comparison of the impact that other pre-trained models may have on translation performance. In this paper, we demonstrate that simply using the output (contextualized embeddings) of a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) as the input of the NMT encoder achieves state-of-the-art translation performance. Moreover, we also propose a stochastic layer selection approach and a concept of dual-directional translation model to ensure the sufficient utilization of contextualized embeddings. In the case of without using back translation, our best models achieve BLEU scores of 30.45 for En->De and 38.61 for De->En on the IWSLT'14 dataset, and 31.26 for En->De and 34.94 for De->En on the WMT'14 dataset, which exceeds all published numbers.

翻译：使用隐形语言模型(如BERT)进行多种自然语言处理任务的双向编码器的成功,促使研究人员试图将这些经过预先训练的模型纳入神经机翻译系统,然而,拟议的将经过训练的模型纳入神经机翻译(NMT)系统的方法并非三重性,而且主要侧重于BERT,后者缺乏对其他经过预先训练的模型可能对翻译绩效产生的影响的比较。在本文中,我们证明,仅仅使用专门和适当的经过训练的双语预先语言模型(dubbbbed BiBERT)的产出(文字化嵌入),作为NMT编码器的投入,就可以达到最新水平的翻译性能。此外,我们还提议采用分层选择法和双向翻译模型的概念,以确保充分利用背景化嵌入的嵌入。在不使用反翻译的情况下,我们的最佳模型在IWSLT'14数据集上达到30.45的BLEEU分,在De-En上达到38.61分,在ENMT'14数据集上达到34.94和En>公布的数据超过W-94。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日