Flexora：面向大语言模型的灵活低秩自适应方法 (Flexora: Flexible Low Rank Adaptation for Large Language Models)

Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely used Low-Rank Adaptation (LoRA) method, have been introduced to expand the boundaries on these tasks, whereas LoRA would underperform on certain tasks owing to its potential overfitting on these tasks. To overcome this overfitting and improve the performance of LoRA, we propose the flexible low rank adaptation (Flexora) method to automatically and flexibly select the most important layers needing to be fine-tuned to achieve the best performance on different downstream tasks. Specifically, Flexora firstly frames this layer selection problem as a well-defined hyperparameter optimization (HPO) problem, then addresses it using the unrolled differentiation (UD) method, and finally selects the most useful layers based on the optimized hyperparameters. Our extensive experiments on many pretrained models and natural language tasks show that Flexora is able to consistently improve over the existing baselines, indicating the effectiveness of our Flexora in practice. We additionally provide insightful theoretical results and many ablation studies to deliver a comprehensive understanding of our Flexora.

翻译：大语言模型（LLMs）通过增加模型参数规模推动着人工智能的进步，这显著增强了模型的泛化能力，并在实践中解锁了新的能力。然而，这些模型在特定下游任务上的性能通常受限于其在这些任务上的知识边界。因此，微调技术——尤其是广泛使用的低秩自适应（LoRA）方法——被引入以扩展这些任务的边界，但LoRA在某些任务上可能因潜在的过拟合问题而表现不佳。为克服这种过拟合并提升LoRA的性能，我们提出灵活低秩自适应（Flexora）方法，能够自动且灵活地选择需要微调的最重要层级，以在不同下游任务上实现最佳性能。具体而言，Flexora首先将该层级选择问题构建为明确的超参数优化（HPO）问题，随后采用展开微分（UD）方法进行求解，最终基于优化后的超参数选择最有用的层级。我们在多种预训练模型和自然语言任务上的大量实验表明，Flexora能够持续超越现有基线方法，证明了其实践有效性。我们还提供了具有洞察力的理论结果和多项消融实验，以提供对Flexora的全面理解。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日