Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer. Previous work has shown that these models are able to achieve cross-lingual representations when pretrained on two or more languages with shared parameters. In this work, we provide evidence that a model can achieve language-agnostic representations even when pretrained on a single language. That is, we find that monolingual models pretrained and finetuned on different languages achieve competitive performance compared to the ones that use the same target language. Surprisingly, the models show a similar performance on a same task regardless of the pretraining language. For example, models pretrained on distant languages such as German and Portuguese perform similarly on English tasks.
翻译:预先培训的多语种模式已成为事实上的默认的零点跨语种转让办法。先前的工作表明,这些模式在预先培训两种或两种以上具有共同参数的语言时,能够实现跨语种代表制。在这项工作中,我们提供证据表明,即使事先培训了一种语言,也能够实现语言不可知性代表制。也就是说,我们发现,单语模式对不同语言进行预先培训和微调后,与使用同一目标语言的模式相比,能够取得竞争性业绩。令人惊讶的是,这些模式显示,无论培训前语言如何,在一项任务上都表现出类似的业绩。例如,对德语和葡萄牙语等遥远语言进行类似英语任务的培训。