Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks. Self-supervised pretrained models are often fine-tuned on parallel data from one or multiple language pairs for machine translation. Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive. Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative. However, the former does not permit any sharing between languages, while the latter shares parameters for all languages and is susceptible to negative interference. In this paper, we propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer. Our approach outperforms related baselines, yielding higher translation scores on average when translating from English to 17 different low-resource languages. We also show that language-family adapters provide an effective method to translate to languages unseen during pretraining.
翻译:自我监督训练的大型多语言模型在各种自然语言处理任务中都能实现最先进的结果。使用自监督预训练模型在一个或多个语言对的平行数据上微调,能够提高低资源语言的性能,但需要修改整个模型,代价高昂。在每个语言对上训练一个新的适配器或在所有语言对上训练单个适配器而不更新预训练模型,已被提出作为一种参数高效的替代方案。但是,前者不允许任何语言之间共享,而后者共享所有语言的参数,并容易受到负面干扰。在本文中,我们提出了在mBART-50之上训练语系族适配器,以促进跨语言传递。我们的方法优于相关基线,在从英文翻译为17种不同的低资源语言时平均得分更高。我们还表明,语系族适配器提供了一种有效的方法来翻译到在预训练期间未见过的语言。