Multilingual pre-trained contextual embedding models (Devlin et al., 2019) have achieved impressive performance on zero-shot cross-lingual transfer tasks. Finding the most effective fine-tuning strategy to fine-tune these models on high-resource languages so that it transfers well to the zero-shot languages is a non-trivial task. In this paper, we propose a novel meta-optimizer to soft-select which layers of the pre-trained model to freeze during fine-tuning. We train the meta-optimizer by simulating the zero-shot transfer scenario. Results on cross-lingual natural language inference show that our approach improves over the simple fine-tuning baseline and X-MAML (Nooralahzadeh et al., 2020).
翻译:多语言预先培训的背景嵌入模型(Devlin等人,2019年)在零点点点跨语言传输任务上取得了令人印象深刻的业绩。找到最有效的微调战略,将这些模型微调高资源语言,使之顺利地转移到零点点语言上,这是一项非三重任务。在本文中,我们向软选择提出了一个新的元优化工具,在微调期间,哪些层次的预先培训模式可以冻结。我们通过模拟零点点传输情景来培训元优化。关于跨语言自然语言推论的结果表明,我们的方法比简单的微调基线和X-MAML(Nooralahzadeh等人,2020年)有所改进。