Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer. Extensive experiments are conducted to demonstrate the effectiveness of AdapterBias. The experiments show that our proposed method can dramatically reduce the trainable parameters compared to the previous works with a minimal decrease in task performances compared with fine-tuned pre-trained models. We further find that AdapterBias automatically learns to assign more significant representation shifts to the tokens related to the task in consideration.
翻译:具有数百万个参数的基于变压器的预培训模型需要大量存储。 最近的方法通过培训适应器解决了这一缺陷,但这些方法仍需要数量较大的参数。 在这次研究中, 提出了一个令人惊讶的简单而有效的变压器架构。 调压器Bias增加了对变压器层隐藏输出的象征性的改变, 以适应下游任务, 只有一个矢量和一个线性层。 进行了广泛的实验以证明适应器Bias的有效性。 实验表明,与先前的工作相比, 我们提出的方法可以大大减少可培训的参数, 任务性能与经过精细调整的预先培训模型相比略有下降。 我们还发现, 调控器Bias自动学会将更显著的代号转换到与任务相关的代号上。