关于预先培训与随机启动资源 -- -- Rich机器翻译之间的互补性 (On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation)

Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT). However, it usually fails to achieve notable gains (sometimes, even worse) on resource-rich NMT on par with its Random-Initialization (RI) counterpart. We take the first step to investigate the complementarity between PT and RI in resource-rich scenarios via two probing analyses, and find that: 1) PT improves NOT the accuracy, but the generalization by achieving flatter loss landscapes than that of RI; 2) PT improves NOT the confidence of lexical choice, but the negative diversity by assigning smoother lexical probability distributions than that of RI. Based on these insights, we propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI. Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other, achieving substantial improvements considering both translation accuracy, generalization, and negative diversity. Probing tools and code are released at: https://github.com/zanchangtong/PTvsRI.

翻译：培训前的文字表述方法(PT)成功地应用于低资源神经机能翻译(NMT),但通常无法在资源丰富的NMT与随机启动(RI)对等方相比在资源丰富的NMT上取得显著(有时甚至更糟)的显著进展。我们采取的第一步是调查PT和RI在资源丰富的假设中的互补性,方法是进行两次测试分析,发现:(1) PT没有提高准确性,但通过实现优美的损失景观而不是RI(NMT)而普遍化;(2) PT和RI提高了词汇选择的信心,但通过分配比RI的更平稳的词汇概率分布而提高了负面的多样性。基于这些见解,我们提议将其互补性与模型集成算法结合起来,利用最佳运输法使PT和RI之间的神经系统保持一致。对资源丰富的两个翻译基准WMT'17英语-中文(20M)和WMT'19英语-德语(36M)的实验表明,PT和RI可以很好地互相补充,考虑到翻译的准确性、一般化和负式的版本/移动工具。

相关内容

Machine Translation

关注 0

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日