Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains.
翻译:受过训练的语言模型(PLMS)是针对大型组合进行的培训,但往往需要专门针对特定领域。 一种参数效率适应方法建议对每个领域进行语言模型任务适应器的培训。 这导致良好的内部评分,但对于域或资源限制的设置可能不切实际。 一种解决办法是在测试时对新领域使用相关的域适应器。 在本文中, 我们引入了适应软件, 这是一种在不同领域培训的适配者之间平均保持加权空间的方法。 我们的方法是令人尴尬的平行的: 首先, 我们训练一套特定域的适配者; 然后, 我们为每个新创领域确定哪些适配者应在测试时平均使用。 我们展示了广泛的实验, 显示适应软件在没有额外培训的情况下不断提高新域的性能。 我们还探索同一领域培训的适应者在不同的超参数中的比重, 并显示它保持了在新域中受过训练的PLM的性能, 同时获得了强大的内部结果。 我们探索了选择哪些适应者组合的各种方法, 例如, 最具有竞争性的集群和赛程相似的领域。