Low-Rank Adaptation (LoRA), a prominent technique within the framework of Parameter-Efficient Fine-Tuning (PEFT), efficiently reduces the computational burden associated with adapting Large Language Models (LLMs) to downstream tasks, thereby enabling resource-constrained fine-tuning. However, existing researches have shown that LoRA suffers from slow convergence. To address this limitation, we introduce Dimension-Sharding Adaptation (DiSHA), which expands the PEFT design space to even fewer trainable parameters and faster convergence. Within DiSHA's design space, we propose Block Affine Efficient Computation (Bone), a computationally efficient structure that delivers both high performance and efficiency. While certain DiSHA configurations may result in colinear updates to weight shards, we address this with Block Affine Transformation (Bat), a nonlinear variant of DiSHA. Bat introduces nonlinearity by combining trainable matrices with original weight shards in a nonlinear manner, inducing nonlinearity in matrix updates without introducing additional parameters. Empirical results show that Bone, under the DiSHA framework, consistently outperforms LoRA variants in both Natural Language Understanding and Natural Language Generation tasks, with significantly improved computational efficiency. Further analysis demonstrates that BAT enhances model capabilities by leveraging its nonlinear design.
 翻译:暂无翻译