Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used. Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on (synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French-German and German-Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French-German over the cascaded baseline.
翻译:语言翻译或枢纽翻译等复杂的自然语言应用传统上依赖级联模式。然而,已知级联模型容易出现错误传播和模型差异问题。此外,在常规级联系统中不可能使用端到端培训数据,这意味着无法使用最适合这项任务的培训数据。以前的研究建议了几种综合端到端培训办法来克服这些问题,尽管它们大多依赖(合成或自然)三向数据。我们建议了一种基于非航空式变换器的级联模型,这样可以进行端到端培训,而不需要明确的中间代表。这种新的结构(一)避免了不必要的早期决定,这种决定可能造成错误,然后在整个级联模式中传播,以及(二)直接利用端到端培训数据来解决这些问题。我们评估了两种基于节点的机器翻译任务,即法语-德语和德语-捷克语。我们的实验结果表明,拟议的结构使得法德的升级基线得到超过2个BLEU的改进。