通过跨语言和渐进式转移学习进行高效语言模式培训 (Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning)

Most Transformer language models are primarily pretrained on English text, limiting their use for other languages. As the model sizes grow, the performance gap between English and other languages with fewer compute and data resources increases even further. Consequently, more resource-efficient training methods are needed to bridge the gap for languages with fewer resources available. To address this problem, we introduce a cross-lingual and progressive transfer learning approach, called CLP-Transfer, that transfers models from a source language, for which pretrained models are publicly available, like English, to a new target language. As opposed to prior work, which focused on the cross-lingual transfer between two languages, we extend the transfer to the model size. Given a pretrained model in a source language, we aim for a same-sized model in a target language. Instead of training a model from scratch, we exploit a smaller model that is in the target language but requires much fewer resources. Both small and source models are then used to initialize the token embeddings of the larger model based on the overlapping vocabulary of the source and target language. All remaining weights are reused from the model in the source language. This approach outperforms the sole cross-lingual transfer and can save up to 80% of the training steps compared to the random initialization.

翻译：多数变换语言模式主要在英语文本上经过预先培训,限制了其他语言的使用。随着模型规模的扩大,英语与其他语言之间的性能差距在计算较少,数据资源进一步增加。因此,需要更具资源效率的培训方法,以弥补可用资源较少的语文之间的差距。为了解决这一问题,我们采用了一种跨语言和渐进的转移学习方法,称为CLP-Transfer,即从一种源语言转让模型,这种源语言的预先培训模式可以公开提供,例如英语,到一种新的目标语言。与以前的工作相比,这种模式侧重于两种语言之间的跨语言转让,我们把这种转让扩大到模型的规模。鉴于一种源语言的预培训模式,我们的目标是在一种目标语言中采用一个同样大小的模式。我们从零开始培训,而是利用一种使用目标语言的较小和逐步转移模式,但所需要的资源要少得多。然后使用两种小型和源模式来启动基于源和目标语言重叠的词汇的较大模型的象征性嵌套。所有剩余重量都可以从源语言的模型中重新利用。这一方法将80种语言的初始语言转换为单一的跨语言化步骤。这一方法将自动转换为80种语言。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/