Zero-shot cross-lingual transfer is promising, however has been shown to be sub-optimal, with inferior transfer performance across low-resource languages. In this work, we envision languages as domains for improving zero-shot transfer by jointly reducing the feature incongruity between the source and the target language and increasing the generalization capabilities of pre-trained multilingual transformers. We show that our approach, DiTTO, significantly outperforms the standard zero-shot fine-tuning method on multiple datasets across all languages using solely unlabeled instances in the target language. Empirical results show that jointly reducing feature incongruity for multiple target languages is vital for successful cross-lingual transfer. Moreover, our model enables better cross-lingual transfer than standard fine-tuning methods, even in the few-shot setting.
翻译:零点跨语言传输很有希望,但是,事实证明,这种传输效果不尽人意,低资源语言之间的传输性能较差。在这项工作中,我们设想将语言作为改进零点传输的领域,共同减少源和目标语言之间的特异性,提高预先培训的多语种变压器的普及能力。我们显示,我们的DiTTO(DiTTO)方法大大优于所有语言多套数据集的标准零点微调方法,只使用目标语言中未标注的实例。 经验性结果显示,联合减少多目标语言的特异性对于成功实现跨语言传输至关重要。 此外,我们的模型比标准的微调方法更便于跨语言传输,即使是在微调环境下也是如此。</s>