Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters often increase parameters (e.g. bottleneck dimension) for matching the performance of full model fine-tuning, which we argue goes against their original intention. In this work, we re-examine the parameter-efficiency of Adapters through the lens of network pruning (we name such plug-in concept as \texttt{SparseAdapter}) and find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80\%. Based on our findings, we introduce an easy but effective setting ``\textit{Large-Sparse}'' to improve the model capacity of Adapters under the same parameter budget. Experiments on five competitive Adapters upon three advanced PLMs show that with proper sparse method (e.g. SNIP) and ratio (e.g. 40\%) SparseAdapter can consistently outperform their corresponding counterpart. Encouragingly, with the \textit{Large-Sparse} setting, we can obtain further appealing gains, even outperforming the full fine-tuning by a large margin. Our code will be released at: https://github.com/Shwai-He/SparseAdapter.
翻译:将预先培训的语言模型(PLM)冻结下来, 并且只微调少数额外的模块, 从而成为全模型微调的一个有吸引力的高效替代物。 尽管计算效率很高, 最近的适应者经常会增加参数( 如瓶颈维度), 以匹配完全模型微调的性能, 我们认为这是违背其原意的。 在这项工作中, 我们通过网络运行透视镜重新检查适应者的参数效率( 我们将这种插件概念命名为\ textt{SparseAdapter} ), 并发现在稀释比率达到80 ⁇ 时, SprassAdapter能够比标准适应者取得可比或更好的性能。 根据我们的调查结果, 我们引入了简单而有效的设置 { textitle{Large-Sparse} 来提高同一参数预算下的适应者的模型能力。 在三个高级的PLMS 上对五个竞争性调试器进行实验, 用适当的稀释方法( 如 SNIP) 和比率(e. g. e. ) 甚至是40), SprassarrassA- retactionS- sprill acreappreabal ax