Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique. In this paper, we show that such translation process can introduce subtle artifacts that have a notable impact in existing cross-lingual models. For instance, in natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them, which current models are highly sensitive to. We show that some previous findings in cross-lingual transfer learning need to be reconsidered in the light of this phenomenon. Based on the gained insights, we also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
翻译:人文和机器翻译在跨语言传输学习中发挥着核心作用:许多多语言数据集是通过专业翻译服务创建的,使用机器翻译来翻译测试集或培训集是一种广泛使用的传输技术。在本文中,我们表明,这种翻译过程可以引入对现有的跨语言模式有显著影响的微妙文物。例如,在自然语言推论中,独立翻译前提和假设可以减少它们之间在词汇上的重叠,而目前的模型对此非常敏感。我们表明,根据这一现象,需要重新考虑以前在跨语言传输学习方面的一些发现。根据获得的深入了解,我们还改进了XNLI在4.3点和2.8点的翻译测试和零点方法方面的最新技术。