Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting, typically utilizing these to support new LoRAs learn new tasks. However, these methods not only ignore growing computational memory with tasks and limited storage space but also suffer from potential task interference due to the lack of effective LoRA merging mechanisms. In this paper, we propose a novel continual learning method that orthogonally initializes and sequentially merges LoRAs updates into a single unified LoRA. Our method leverages orthogonal basis extraction from previously learned LoRA to initialize the learning of new tasks, further exploits the intrinsic asymmetry property of LoRA components by using a time-aware scaling mechanism to balance new and old knowledge during continual merging. Our approach maintains constant memory complexity with respect to the number of tasks, minimizes interference between past and new tasks via orthogonal basis initialization, and improves performance over asymmetric LoRA merging via adaptive scaling. We provide theoretical analysis to justify our design and conduct extensive experiments across diverse continual learning benchmarks using various Llama models, demonstrating the effectiveness and efficiency of our method.
翻译:参数高效的持续学习已成为大型语言模型(LLM)在适应新任务的同时缓解灾难性遗忘的一种有前景的方法。当前的低秩适应(LoRA)持续学习技术通常保留并冻结先前学习的LoRA模块或生成数据表示以克服遗忘,通常利用这些来支持新LoRA学习新任务。然而,这些方法不仅忽视了任务增长带来的计算内存需求和有限的存储空间,还因缺乏有效的LoRA合并机制而遭受潜在的任务干扰。本文提出一种新颖的持续学习方法,通过正交初始化并将LoRA更新顺序合并为单一的统一LoRA。我们的方法利用从先前学习的LoRA中提取的正交基来初始化新任务的学习,进一步利用LoRA组件固有的非对称特性,通过时间感知的缩放机制在持续合并过程中平衡新旧知识。该方法在任务数量方面保持恒定的内存复杂度,通过正交基初始化最小化过去任务与新任务之间的干扰,并通过自适应缩放改进非对称LoRA合并的性能。我们提供了理论分析以论证设计合理性,并基于多种Llama模型在多样化的持续学习基准上进行了广泛实验,证明了方法的有效性和高效性。