Parameter-efficient fine-tuning aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. Several strategies (e.g., Adapters, prefix tuning, BitFit, and LoRA) have been proposed. However, their designs are hand-crafted separately, and it remains unclear whether certain design patterns exist for parameter-efficient fine-tuning. Thus, we present a parameter-efficient fine-tuning design paradigm and discover design patterns that are applicable to different experimental settings. Instead of focusing on designing another individual tuning strategy, we introduce parameter-efficient fine-tuning design spaces that parameterize tuning structures and tuning strategies. Specifically, any design space is characterized by four components: layer grouping, trainable parameter allocation, tunable groups, and strategy assignment. Starting from an initial design space, we progressively refine the space based on the model quality of each design choice and make greedy selection at each stage over these four components. We discover the following design patterns: (i) group layers in a spindle pattern; (ii) allocate the number of trainable parameters to layers uniformly; (iii) tune all the groups; (iv) assign proper tuning strategies to different groups. These design patterns result in new parameter-efficient fine-tuning methods. We show experimentally that these methods consistently and significantly outperform investigated parameter-efficient fine-tuning strategies across different backbone models and different tasks in natural language processing.
翻译:参数效率微调的目的是利用较少的可训练参数实现与微调相近的性能,使用较少的可训练参数; 提出了若干战略(例如适应器、前缀调制、BitFit和LORA), 然而,它们的设计是手工制作的, 并且仍然不清楚某些设计模式是否存在, 用于参数效率微调。 因此, 我们提出了一个具有参数效率的微调设计模式, 并发现适用于不同实验环境的设计模式。 我们不是侧重于设计另一个单独的调试战略, 而是采用具有参数效率的微调设计空间, 以对调结构和调控战略进行比对。 具体地说, 任何设计空间的特点是四个组成部分: 层组合、 可训练参数分配、 金枪鱼组合和战略分配。 从最初的设计空间开始, 我们逐步改进每个设计选择的模型质量, 并在每个阶段对这四个组成部分作出贪婪的选择。 我们发现以下的设计模式:(一) 串调结构中的组级级结构;(二) 将可训练参数的数量分配给各层次的统一;(三) 调整所有组合的微; (四) 正确调整所有组合; 适当调整各种的自然调制的参数。