Recent work in simultaneous machine translation is often trained with conventional full sentence translation corpora, leading to either excessive latency or necessity to anticipate as-yet-unarrived words, when dealing with a language pair whose word orders significantly differ. This is unlike human simultaneous interpreters who produce largely monotonic translations at the expense of the grammaticality of a sentence being translated. In this paper, we thus propose an algorithm to reorder and refine the target side of a full sentence translation corpus, so that the words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation. We then train a widely used wait-k simultaneous translation model on this reordered-and-refined corpus. The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.
翻译:同声机翻译的近期工作往往经过常规的全句翻译培训,导致在与单词顺序大不相同的一对语言打交道时,过度的延迟或有必要预测不流的词句。这与同时使用的人工翻译不同,因为同时使用的口译员主要生产单声译文,而牺牲正在翻译的句子的语法性。因此,在本文件中,我们建议一种算法,重新排列和完善整个句子翻译内容的目标侧面,以便源与目标句之间的词句/词句基本上单调一致,使用单词对齐和非自动神经机器翻译。然后,我们在这个重新排序和精细的文体上培训一种广泛使用的等待-公里同步翻译模式。拟议方法改进了BLEU的评分,并由此展示了源句的单一性。