Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
翻译:在具体下游任务方面,对预先培训的语言模型(PLM)进行微调是一项众所周知的自然语言处理模式,然而,随着PLM规模的不断扩大,对整个模式进行若干下游任务的培训变得非常昂贵和资源匮乏。最近,提出了不同的参数高效调试技术,以提高微调PLM的效率。PET方法的一个流行类别是低级别适应方法,将可学习的短途SVD模块按顺序或平行地插入原始模式。然而,低级别分解因代表力量有限而受到影响。在这项工作中,我们使用克伦克尔产品而不是低级代表来解决这一问题。我们采用了克伦克伦克尔产品调整模块,以高效微调基于变压器的PLMS。我们采用拟议的微调方法,对GLUE基准的T5进行微调,以显示采用克龙克尔为基础的模块可以超越国家先进PET方法。