小型模式适应:通过统一浅水培训,将预先培训的模型有效推广到新语言 (Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training)

Prior work has shown that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen. Despite learning a small subset of parameters, this approach is not compute-efficient, as training the new embeddings requires a full forward and backward pass over the entire model. In this work, we propose mini-model adaptation, a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model's parameters. New language-specific embeddings can then be efficiently trained over the mini-model, and plugged into the aligned large model for rapid cross-lingual transfer. We explore two approaches to learn mini-models: MiniJoint, which jointly pretrains the primary model and the mini-model using a single transformer with a secondary MLM head at a middle layer; and MiniPost, where we start from a regular pretrained model and build a mini-model by extracting and freezing a few layers and learning a small number of parameters on top. Experiments on XNLI, MLQA and PAWS-X show that mini-model adaptation matches the performance of the standard approach using up to 2.4x less compute.

翻译：先前的工作表明,有可能通过学习一套新的嵌入式嵌入式,将事先经过训练的隐蔽语言模型(MLM)扩大到新语言,同时将变压器冻结起来。尽管学习了一小部分参数,但这一方法没有计算效率,因为新嵌入式的培训需要对整个模型进行全面的前向和后向传。在这项工作中,我们建议采用微型模型的调整,即一种从一个大模型参数的一小部分建立一种浅度小型模型的计算效率替代方案。然后,新的特定语言嵌入器可以在微型模型中进行有效培训,并插入一个匹配的大型快速跨语言传输模型。我们探索了两种学习微型模型的方法:MiniUnit(MiniUnit),它共同预演初级模型和微型模型,在中间层使用一个二级MLMM头的单一变压器;和MiniPost(MiniPost),我们从一个常规的预先训练型号模型开始,通过提取和冻结几个层次和在顶部学习少量参数来构建一个微型模型。在X级模型上进行实验,在X级的模型上使用低度的模型的测试,用标准方法对模型进行较低的适应。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日