Conditional masked language models (CMLM) have shown impressive progress in non-autoregressive machine translation (NAT). They learn the conditional translation model by predicting the random masked subset in the target sentence. Based on the CMLM framework, we introduce Multi-view Subset Regularization (MvSR), a novel regularization method to improve the performance of the NAT model. Specifically, MvSR consists of two parts: (1) \textit{shared mask consistency}: we forward the same target with different mask strategies, and encourage the predictions of shared mask positions to be consistent with each other. (2) \textit{model consistency}, we maintain an exponential moving average of the model weights, and enforce the predictions to be consistent between the average model and the online model. Without changing the CMLM-based architecture, our approach achieves remarkable performance on three public benchmarks with 0.36-1.14 BLEU gains over previous NAT models. Moreover, compared with the stronger Transformer baseline, we reduce the gap to 0.01-0.44 BLEU scores on small datasets (WMT16 RO$\leftrightarrow$EN and IWSLT DE$\rightarrow$EN).
翻译:有条件隐形语言模型(CMLM)在非自动隐形机器翻译(NAT)方面取得了令人印象深刻的进展。它们通过预测目标句中的随机掩码子项学习了有条件翻译模型。根据CMM框架,我们引入了多视图子子常规化(MvSR),这是改进NAT模型绩效的一种新颖的正规化方法。具体地说,MvSR由两部分组成:(1)\textit{共享遮罩一致性}:我们用不同的遮罩战略推进同一目标,并鼓励预测共同遮罩位置相互一致。(2)\textit{模范一致性},我们保持模型重量的指数移动平均值,并强制执行预测,使之在平均模型和在线模型之间保持一致。在不改变基于CMLMM模型的架构的情况下,我们的方法在三个公共基准上取得了显著的绩效,即0.36-1.14 BLEU比前NAT模型的收益。此外,与更强大的变换基准相比,我们将小型数据集的差距缩小到0.01-044 BLEU的分数(WMT16 RONARRRINSLT)。