Transformers have achieved great success in machine translation, but transformer-based NMT models often require millions of bilingual parallel corpus for training. In this paper, we propose a novel architecture named as attention link (AL) to help improve transformer models' performance, especially in low training resources. We theoretically demonstrate the superiority of our attention link architecture in low training resources. Besides, we have done a large number of experiments, including en-de, de-en, en-fr, en-it, it-en, en-ro translation tasks on the IWSLT14 dataset as well as real low resources scene on bn-gu and gu-ta translation tasks on the CVIT PIB dataset. All the experiment results show our attention link is powerful and can lead to a significant improvement. In addition, we achieve a 37.9 BLEU score, a new sota, on the IWSLT14 de-en task by combining our attention link and other advanced methods.
翻译:变压器在机器翻译方面取得了巨大成功,但基于变压器的NMT模型往往需要数百万双语平行培训。 在本文中,我们建议建立一个名为关注链接(AL)的新结构,以帮助改善变压器模型的性能,特别是在低培训资源方面。我们理论上展示了在低培训资源方面我们关注链接结构的优势。此外,我们做了大量实验,包括IWSLT14数据集的en-fr、en-it、IWSLT14数据集的en-ro翻译任务,以及CVIT PIB数据集的bn-gu和gu-ta翻译任务上真正的低资源场景。所有实验结果都表明我们的关注联系是强大的,能够带来显著的改善。此外,我们通过将我们的关注链接和其他先进方法结合起来,在IWSLT14 D-en任务上实现了37.9 BLEU评分,这是一个新的索塔。