Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely used Low-Rank Adaptation (LoRA) method, have been introduced to expand the boundaries on these tasks, whereas LoRA would underperform on certain tasks owing to its potential overfitting on these tasks. To overcome this overfitting and improve the performance of LoRA, we propose the flexible low rank adaptation (Flexora) method to automatically and flexibly select the most important layers needing to be fine-tuned to achieve the best performance on different downstream tasks. Specifically, Flexora firstly frames this layer selection problem as a well-defined hyperparameter optimization (HPO) problem, then addresses it using the unrolled differentiation (UD) method, and finally selects the most useful layers based on the optimized hyperparameters. Our extensive experiments on many pretrained models and natural language tasks show that Flexora is able to consistently improve over the existing baselines, indicating the effectiveness of our Flexora in practice. We additionally provide insightful theoretical results and many ablation studies to deliver a comprehensive understanding of our Flexora.
翻译:大语言模型(LLMs)通过增加模型参数规模推动着人工智能的进步,这显著增强了模型的泛化能力,并在实践中解锁了新的能力。然而,这些模型在特定下游任务上的性能通常受限于其在这些任务上的知识边界。因此,微调技术——尤其是广泛使用的低秩自适应(LoRA)方法——被引入以扩展这些任务的边界,但LoRA在某些任务上可能因潜在的过拟合问题而表现不佳。为克服这种过拟合并提升LoRA的性能,我们提出灵活低秩自适应(Flexora)方法,能够自动且灵活地选择需要微调的最重要层级,以在不同下游任务上实现最佳性能。具体而言,Flexora首先将该层级选择问题构建为明确的超参数优化(HPO)问题,随后采用展开微分(UD)方法进行求解,最终基于优化后的超参数选择最有用的层级。我们在多种预训练模型和自然语言任务上的大量实验表明,Flexora能够持续超越现有基线方法,证明了其实践有效性。我们还提供了具有洞察力的理论结果和多项消融实验,以提供对Flexora的全面理解。