适配器浓汤：加权平均以提高预训练语言模型的泛化能力 (AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models) - 专知论文

会员服务 ·

0

适配 · 预训练语言模型 · 泛化能力 · 语言模型 · 预训练 ·

2023 年 3 月 28 日

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

翻译：适配器浓汤：加权平均以提高预训练语言模型的泛化能力

Alexandra Chronopoulou,Matthew E. Peters,Alexander Fraser,Jesse Dodge

from arxiv, Accepted at EACL 2023; camera-ready version; fixed typo in related work

Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.

翻译：预训练语言模型（PLM）在大规模语料库上训练，但通常需要针对特定领域进行专门特化。一种具有参数效率的适应方法建议在语言建模任务上针对每个领域训练适配器。这导致在领域内获得良好的得分，但在领域或资源受限的情况下可能不切实际。解决方案是在测试时使用相关的领域适配器进行新领域的适应。本文介绍了一种名为AdapterSoup的方法，该方法对在不同领域训练的适配器进行加权平均以执行权重空间平均化。我们的方法是无耻并行的：首先，我们训练一组特定于领域的适配器；然后，对于每个新领域，我们确定在测试时应该平均哪些适配器。我们进行了广泛的实验，表明AdapterSoup可以在不进行额外训练的情况下始终提高对新领域的性能。我们还探索了在不同超参数下训练的相同领域适配器的权重平均值，并表明它在新领域上保持PLM的性能，同时获得强大的领域内结果。我们探索了选择哪些适配器进行组合的各种方法，例如文本聚类和语义相似性。我们发现使用聚类会导致在新领域上获得最具竞争力的结果。

0

相关内容

预训练语言模型的应用综述

预训练语言模型的应用综述

专知会员服务

36+阅读 · 2023年1月23日

【纽约大学 Ethan Perez 博士论文】在预训练语言模型中发现和修正不良行为，217页pdf，，Finding and Fixing Undesirable Behaviors in Pretrained Language Models

【纽约大学 Ethan Perez 博士论文】在预训练语言模型中发现和修正不良行为，217页pdf，，Finding and Fixing Undesirable Behaviors in Pretrained Language Models

专知会员服务

18+阅读 · 2022年3月16日

【AAAI 2022】XLM-K：通过多语言知识库提高跨语言预训练模型

【AAAI 2022】XLM-K：通过多语言知识库提高跨语言预训练模型

专知会员服务

27+阅读 · 2022年1月13日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

Ladder Side-Tuning：预训练模型的“过墙梯”

Ladder Side-Tuning：预训练模型的“过墙梯”

PaperWeekly

0+阅读 · 2022年6月24日

谷歌&HuggingFace| 零样本能力最强的语言模型结构

谷歌&HuggingFace| 零样本能力最强的语言模型结构

夕小瑶的卖萌屋

0+阅读 · 2022年6月23日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

使用 Bert 预训练模型文本分类（内附源码）

使用 Bert 预训练模型文本分类（内附源码）

数据库开发

102+阅读 · 2019年3月12日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

ERG介导组蛋白修饰调控SLP2促进EMT在前列腺癌转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

层次贝叶斯模型中隐性变量分布的非参数估计及在RNA-seq数据中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

不同基因型（p53codon72）鼻咽癌细胞放射敏感性差异的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非经典耗散方程的谱方法与动力学研究

国家自然科学基金

1+阅读 · 2011年12月31日

多智能体系统的分布式动态覆盖控制

国家自然科学基金

5+阅读 · 2011年12月31日

面向文本分类的迁移学习和半监督学习方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

Unsupervised Multi-channel Separation and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Universal Domain Adaptation from Foundation Models

Arxiv

0+阅读 · 2023年5月18日

BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

Arxiv

0+阅读 · 2023年5月18日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月18日

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Arxiv

0+阅读 · 2023年5月17日

Adapting Sentence Transformers for the Aviation Domain

Arxiv

0+阅读 · 2023年5月16日

Weight-Inherited Distillation for Task-Agnostic BERT Compression

Arxiv

0+阅读 · 2023年5月16日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

VIP会员

文章信息

相关主题

预训练语言模型

相关VIP内容

预训练语言模型的应用综述

预训练语言模型的应用综述

专知会员服务

36+阅读 · 2023年1月23日

【纽约大学 Ethan Perez 博士论文】在预训练语言模型中发现和修正不良行为，217页pdf，，Finding and Fixing Undesirable Behaviors in Pretrained Language Models

【纽约大学 Ethan Perez 博士论文】在预训练语言模型中发现和修正不良行为，217页pdf，，Finding and Fixing Undesirable Behaviors in Pretrained Language Models

专知会员服务

18+阅读 · 2022年3月16日

【AAAI 2022】XLM-K：通过多语言知识库提高跨语言预训练模型

【AAAI 2022】XLM-K：通过多语言知识库提高跨语言预训练模型

专知会员服务

27+阅读 · 2022年1月13日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

IJCAI 2022 | 使用陈述句进行视觉问答的Prompt Tuning

PaperWeekly

3+阅读 · 2022年9月21日

Ladder Side-Tuning：预训练模型的“过墙梯”

Ladder Side-Tuning：预训练模型的“过墙梯”

PaperWeekly

0+阅读 · 2022年6月24日

谷歌&HuggingFace| 零样本能力最强的语言模型结构

谷歌&HuggingFace| 零样本能力最强的语言模型结构

夕小瑶的卖萌屋

0+阅读 · 2022年6月23日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

使用 Bert 预训练模型文本分类（内附源码）

使用 Bert 预训练模型文本分类（内附源码）

数据库开发

102+阅读 · 2019年3月12日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Unsupervised Multi-channel Separation and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Universal Domain Adaptation from Foundation Models

Arxiv

0+阅读 · 2023年5月18日

BERM: Training the Balanced and Extractable Representation for Matching to Improve Generalization Ability of Dense Retrieval

Arxiv

0+阅读 · 2023年5月18日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月18日

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Arxiv

0+阅读 · 2023年5月17日

Adapting Sentence Transformers for the Aviation Domain

Arxiv

0+阅读 · 2023年5月16日

Weight-Inherited Distillation for Task-Agnostic BERT Compression

Arxiv

0+阅读 · 2023年5月16日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

相关基金

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

ERG介导组蛋白修饰调控SLP2促进EMT在前列腺癌转移中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

层次贝叶斯模型中隐性变量分布的非参数估计及在RNA-seq数据中的应用

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

不同基因型（p53codon72）鼻咽癌细胞放射敏感性差异的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非经典耗散方程的谱方法与动力学研究

国家自然科学基金

1+阅读 · 2011年12月31日

多智能体系统的分布式动态覆盖控制

国家自然科学基金

5+阅读 · 2011年12月31日

面向文本分类的迁移学习和半监督学习方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员