Conventional fine-tuning encounters increasing difficulties given the size of current Pre-trained Language Models, which makes parameter-efficient tuning become the focal point of frontier research. Previous methods in this field add tunable adapters into MHA or/and FFN of Transformer blocks to enable PLMs achieve transferability. However, as an important part of Transformer architecture, the power of layer normalization for parameter-efficent tuning is ignored. In this paper, we first propose LN-tuning, by tuning the gain and bias term of Layer Normalization module with only 0.03\% parameters, which is of high time-efficency and significantly superior to baselines which are less than 0.1\% tunable parameters. Further, we study the unified framework of combining LN-tuning with previous ones and we find that: (1) the unified framework of combining prefix-tuning, the adapter-based method working on MHA, and LN-tuning achieves SOTA performance. (2) unified framework which tunes MHA and LayerNorm simultaneously can get performance improvement but those which tune FFN and LayerNorm simultaneous will cause performance decrease. Ablation study validates LN-tuning is of no abundant parameters and gives a further understanding of it.
翻译:常规微调遇到越来越多的困难,因为目前培训前语言模型的规模很大,使得参数效率调适成为前沿研究的焦点点。这个领域以往的方法是将金枪鱼可调适器添加到MHA或/和FFFFF的变换区块中,使PLMs能够实现可转移。然而,作为变异器结构的一个重要部分,忽略了参数-增益调的层正常化能力。在本文件中,我们首先提议LN调控,通过调和层正常化模块的增益和偏差条件,只有0.03 ⁇ 参数,这种参数具有高度的时间效力,大大高于低于0.1 ⁇ 金枪鱼可调试参数的基线。此外,我们研究将LN调适器与以前的参数相结合的统一框架,我们发现:(1) 将前置调调合的统一框架,基于调适配法的MAA和LN调控方法实现SOTA性能。(2) 调控MHA和层Norm的调控统一框架可以改进性能,但调控FMFM和TUNN的调调制将进一步降低性能。