We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation. Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally. To achieve bidirectional updating, we simply reconstruct the training samples from "src$\rightarrow$tgt" to "src+tgt$\rightarrow$tgt+src" without any complicated model modifications. Notably, our approach does not increase any parameters or training steps, requiring the parallel data merely. Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs (data sizes range from 160K to 38M) significantly higher. Encouragingly, our proposed model can complement existing data manipulation strategies, i.e. back translation, data distillation, and data diversification. Extensive analyses show that our approach functions as a novel bilingual code-switcher, obtaining better bilingual alignment.
翻译:我们提出了一个简单而有效的培训前战略 -- -- 神经机翻译的双向培训。 具体地说, 我们双向更新早期的模型参数, 然后正常地调整模型。 为了实现双向更新, 我们只是将培训样本从“ rc$\rightrow$tgt” 重建为“ src+tgt$\rightrow$tgt+src”, 而不做任何复杂的模型修改。 值得注意的是, 我们的方法并不增加任何参数或培训步骤, 只需要平行的数据。 实验结果表明, 双向将SOTA 神经机翻译的性能推向对8对语言( 数据大小从 160K 到 38M) 的15个翻译任务。 令人欣慰的是, 我们提议的模型可以补充现有的数据操纵战略, 即背翻译、 数据蒸馏和数据多样化。 广泛的分析显示, 我们的方法功能是新颖的双语代码转换器, 获得更好的双语校准。