The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask tokens in a bitext pair. Given a masked token, the model uses a pointer network to predict the aligned token in the other language. We alternately perform the above two steps in an expectation-maximization manner. Experimental results show that our method improves cross-lingual transferability on various datasets, especially on the token-level tasks, such as question answering, and structured prediction. Moreover, the model can serve as a pretrained word aligner, which achieves reasonably low error rates on the alignment benchmarks. The code and pretrained parameters are available at https://github.com/CZWin32768/XLM-Align.
翻译:跨语言语言模式通常先于多语种文本或平行句子的蒙面语言模型。 在本文中, 我们作为新的跨语言培训前任务引入了取消字对齐的词对齐。 具体地说, 模式是平行句子的首个自标词对齐。 然后我们随机用比特配对来遮盖符号。 有了遮面符号, 模型使用指针网络来预测其他语言的对齐符号。 我们以期望最大化的方式执行上述两个步骤。 实验结果显示, 我们的方法提高了不同数据集的跨语言传输能力, 特别是象征性任务( 如答题) 和结构化预测。 此外, 模式可以作为预先培训的单词对齐器, 从而在校准基准上实现相当低的错误率。 代码和预先培训参数可以在 https://github. com/ CZWin3278/XLM- Align上查阅 。