The word alignment task, despite its prominence in the era of statistical machine translation (SMT), is niche and under-explored today. In this two-part tutorial, we argue for the continued relevance for word alignment. The first part provides a historical background to word alignment as a core component of the traditional SMT pipeline. We zero-in on GIZA++, an unsupervised, statistical word aligner with surprising longevity. Jumping forward to the era of neural machine translation (NMT), we show how insights from word alignment inspired the attention mechanism fundamental to present-day NMT. The second part shifts to a survey approach. We cover neural word aligners, showing the slow but steady progress towards surpassing GIZA++ performance. Finally, we cover the present-day applications of word alignment, from cross-lingual annotation projection, to improving translation.
翻译:尽管在统计机器翻译(SMT)时代,统一字眼的任务尽管在统计机器翻译(SMT)时代占有显著地位,但今天却处于独特地位,探索不足。在这个由两部分组成的辅导课程中,我们主张保持统一字眼的关联性。第一部分为作为传统SMT管道的核心组成部分的文字一致性提供了历史背景。我们在GIZA+++(一个无人监督的统计字眼匹配者,与令人惊讶的长寿相匹配)上是零入的。在进入神经机器翻译(NMT)时代时,我们展示了从统一字眼的洞察到当今NMT(NMT)的基本关注机制的洞察力。第二部分转向了调查方法。我们涵盖了神经字连接器,显示在超过GIZA+++(GIZA++)表现方面缓慢但稳步的进展。最后,我们涵盖了当前从跨语言注解预测到改进翻译的文字一致性应用。