用于跨语言传输的可合成粗简精度微调 (Composable Sparse Fine-Tuning for Cross-Lingual Transfer)

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

翻译：精密模型的全部参数的微调已成为转移学习的主流方法。为了提高效率,防止灾难性的遗忘和干扰,已经开发了适应器和微调等技术。适应器是模块化的,因为可以结合它们使模型适应知识的不同方面(如专用语言和/或任务适配器)。粗微微微调是表达的,因为它控制了所有模型组成部分的行为。在这项工作中,我们对这些可取的属性都采用了一种新的微调方法。特别是,我们根据“彩票”滴答调器和零微调的简单变体,学会了稀薄的、真实的面罩。从源语的附加数据中获取了任务专用口罩,从目标语言的遮罩模型中获取了语言专用口罩。这两种口罩都可以与预先培训的模式组合。与基于调整的微调不同,这种方法既不增加可测时间参数的数量,也不改变原始模型结构。最重要的是,我们从零调的跨语系交叉调制面罩口罩面罩,通过一个大比例的源码、跨面码/双向美洲的多语言基准,我们在一个基础分析中找到一个基础的多语言基准,在“我们”的多语言代码”数据库中,在“我们搜索”的多语言/智能模型中找到一个基础,在“多语言”的“双向基础,在“多语言”中找到一个基础,在“双基调调。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日