Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages. However, there is still a large performance drop for languages unseen during pre-training, especially African languages. One of the most effective approaches to adapt to a new language is \textit{language adaptive fine-tuning} (LAFT) -- fine-tuning a multilingual PLM on monolingual texts of a language using the pre-training objective. However, adapting to a target language individually takes a large disk space and limits the cross-lingual transfer abilities of the resulting models because they have been specialized for a single language. In this paper, we perform \textit{multilingual adaptive fine-tuning} on 17 most-resourced African languages and three other high-resource languages widely spoken on the African continent to encourage cross-lingual transfer learning. To further specialize the multilingual PLM, we removed vocabulary tokens from the embedding layer that corresponds to non-African writing scripts before MAFT, thus reducing the model size by around 50%. Our evaluation on two multilingual PLMs (AfriBERTa and XLM-R) and three NLP tasks (NER, news topic classification, and sentiment classification) shows that our approach is competitive to applying LAFT on individual languages while requiring significantly less disk space. Additionally, we show that our adapted PLM also improves the zero-shot cross-lingual transfer abilities of parameter efficient fine-tuning methods.
翻译:多种语言培训前的多语言模式(PLM)在资源丰富和资源少的语言的几项下游任务上表现出了令人印象深刻的成绩。然而,在培训前的训练前,对所见语言,特别是非洲语言,仍然有大量的成绩下降。适应新语言的最有效办法之一是在非洲大陆广泛使用17种最富有的非洲语言和三种其他高资源语言,鼓励跨语言的学习。为了进一步专门使用培训前目标,我们删除了用于一种语言的多语言的多语言多语种的PLM(PLM)单语种单语种的单语种文本。然而,适应一种目标语言需要很大的磁盘空间空间空间空间空间空间空间空间空间空间空间空间空间,从而将模式的跨语言传输能力限制在50%左右。我们对于两种多语言的多语言 PLM(AfribTA 和 XLM-FA)的评估显示我们个人语言的竞争性分类方法。