In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.
翻译:在这项工作中,我们提出了一个信息理论框架,制定跨语言语言模式培训前培训模式,作为多语言-多语种文本之间最大程度的相互信息;统一观点有助于我们更好地了解现有的跨语言表述教学方法;更重要的是,在框架的启发下,我们根据对比学习提出新的培训前任务;具体地说,我们认为双语配对是具有相同含义的两种观点,鼓励其编码表达方式与负面例子更加相似。我们利用单语和平行公司,共同培训各种借口任务,以提高预先培训模式的跨语言可转让性。若干基准的实验结果表明,我们的方法取得了显著的更好业绩。守则和预先培训的模式可以在https://akas.ms/infoxlm上查阅。