We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters, small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM). The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information and then by adding a task adapter that uses domain-invariant information to learn task representations in the source domain. The second method jointly learns a supervised classifier while reducing the divergence measure. Compared to strong baselines, our simple methods perform well in natural language inference (MNLI) and the cross-domain sentiment classification task. We even outperform unsupervised domain adaptation methods such as DANN and DSN in sentiment classification, and we are within 0.85% F1 for natural language inference task, by fine-tuning only a fraction of the full model parameters. We release our code at https://github.com/declare-lab/UDAPTER
翻译:我们建议采用两种方法,使未经监督的域适应(UDA)更高效地使用适应器提高参数参数的参数效率,即小型瓶颈层与大规模预先培训语言模型(PLM)的每个层相互渗透。第一种方法将UDA拆构成一个两步过程:首先,增加一个域适应器,学习域变化信息,然后增加一个任务适应器,使用域变化信息学习源域的任务表达方式。第二种方法在减少差异度量的同时,共同学习一个受监督的分类器。与强势基线相比,我们简单的方法在自然语言推论(MNLI)和跨度感官分类任务方面表现良好。我们在情绪分类中甚至超越了DANN和DSN等未经监督的域适应方法,我们在自然语言推断任务中处于0.85%的F1范围内,只微调整个模型参数的一小部分。我们在 https://github.com/declare-lab/UDAPTER发布我们的代码。