Large pretrained language models (LMs) have become the central building block of many NLP applications. Training these models requires ever more computational resources and most of the existing models are trained on English text only. It is exceedingly expensive to train these models in other languages. To alleviate this problem, we introduce a novel method -- called WECHSEL -- to efficiently and effectively transfer pretrained LMs to new languages. WECHSEL can be applied to any model which uses subword-based tokenization and learns an embedding for each subword. The tokenizer of the source model (in English) is replaced with a tokenizer in the target language and token embeddings are initialized such that they are semantically similar to the English tokens by utilizing multilingual static word embeddings covering English and the target language. We use WECHSEL to transfer the English RoBERTa and GPT-2 models to four languages (French, German, Chinese and Swahili). We also study the benefits of our method on very low-resource languages. WECHSEL improves over proposed methods for cross-lingual parameter transfer and outperforms models of comparable size trained from scratch with up to 64x less training effort. Our method makes training large language models for new languages more accessible and less damaging to the environment. We make our code and models publicly available.
翻译:大型预先培训语言模型(LMS)已成为许多非语言语言应用的核心构件。培训这些模型需要越来越多的计算资源,而大多数现有模型仅用英文文本培训。用其他语言培训这些模型非常昂贵。为了缓解这一问题,我们引入了一种创新方法 -- -- 称为WECHSEL -- -- 将预先培训的LMS(WECHSEL)有效和高效地转换为新语言。WECHSEL可以应用到任何使用亚字标记和学习每个子字嵌入的模型。源模型(英文)的代号被替换成目标语言的象征器,而代号嵌入的代号则被初始化,通过使用多语言固定的英语和目标语言嵌入,这些模式在字义上类似于英语符号。我们使用WECHSEL将预先培训的LML将英语和GPT-2模型转换为四种语言(法文、德文、中文和斯瓦希里文)。我们还可以研究我们使用非常低资源语言的方法的好处。WECHSEL改进了跨语言参数传输和超格模式的拟议方法,并且改进了我们所培训的64级语言模型,使经过培训的大规模方法从小化为可比较小的手法式模式,我们可以使用。