Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 on average for NER and 2.5 accuracy score for NLI.
翻译:最近经过培训的多语文前语文模式取得了显著的零点性能,该模式仅对一种源语言进行微调,对目标语言进行直接评价;在这项工作中,我们提议了一个自学框架,进一步利用目标语言的无标签数据,同时在选择高质量银标签的过程中进行不确定性估计;为跨语言转让专门调整和分析了三种不同的不确定性:语言超高/高温不确定性(LEU/LOU)、不确定性(EVI)。我们评估了我们的框架,对两种跨语言任务,包括命名实体识别(NER)和自然语言推断(NLI)的不确定性,共涵盖40种语言,大大超过基准值,平均10个F1和2.5个NLI的准确分数。