Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks. However, existing approaches essentially capture the co-occurrence among tokens through involving the masked language model (MLM) objective with token-level cross entropy. In this work, we extend these approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation. Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to (1) learn universal representations for parallel sentences distributed in one or multiple languages and (2) distinguish the semantically-related words from a shared cross-lingual vocabulary for each sentence. We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation. Experimental results show that the HiCTL outperforms the state-of-the-art XLM-R by an absolute gain of 4.2% accuracy on the XTREME benchmark as well as achieves substantial improvements on both of the high-resource and low-resource English-to-X translation tasks over strong baselines.
翻译:最近的研究显示,跨语言预先培训模式(如多语种BERT和XLM)对于跨语言的NLP任务具有巨大的优势,但是,现有办法基本上通过将隐形语言模式(MLM)的目标与象征性的跨星体连接起来,从而在象征性的跨星体中捕捉到象征物之间的共鸣。在这项工作中,我们将这些办法推广到学习判决层面的表述方式,并展示跨语言理解和生成的有效性。具体地说,我们提议采用分级对立学习方法,以便(1) 学习一种或多种语言的平行句子的普遍表述,(2) 将语义相关词与每个词句子的共用跨语言词汇区分开来。我们对两种具有挑战性的跨语言任务,即XTREME和机器翻译进行评估。实验结果表明,HCTLL超越了最先进的XLM-R标准,在XTREME基准上获得了4.2%的绝对准确率,并在高资源和低资源英文至X翻译任务上都取得了显著的改进。