Continual Instruction Tuning (CIT) is adopted to continually instruct Large Models to follow human intent data by data. It is observed that existing gradient update would heavily destroy the performance on previous datasets during CIT process. Instead, Exponential Moving Average (EMA), owns the ability to trace previous parameters, which can aid in decreasing forgetting. Nonetheless, its stable balance weight fails to deal with the ever-changing datasets, leading to the out-of-balance between plasticity and stability. In this paper, we propose a general continual instruction tuning framework to address the challenge. Starting from the trade-off prerequisite and EMA update, we propose the plasticity and stability ideal condition. Based on Taylor expansion in the loss function, we find the optimal balance weight can be automatically determined by the gradients and learned parameters. Therefore, we propose a stable-plasticity balanced coefficient to avoid knowledge interference. Based on the semantic similarity of the instructions, we can determine whether to retrain or expand the training parameters and allocate the most suitable parameters for the testing instances. Extensive experiments across multiple continual instruction tuning benchmarks demonstrate that our approach not only enhances anti-forgetting capabilities but also significantly improves overall continual tuning performance. Our code is available at https://github.com/JingyangQiao/CoIN.
翻译:持续指令微调(CIT)旨在通过数据持续指导大模型遵循人类意图数据。研究发现,在CIT过程中,现有的梯度更新会严重破坏模型在先前数据集上的性能。相比之下,指数移动平均(EMA)具有追踪历史参数的能力,有助于减少遗忘。然而,其固定的平衡权重无法应对不断变化的数据集,导致可塑性与稳定性之间的失衡。本文提出了一种通用的持续指令微调框架以应对这一挑战。从权衡前提和EMA更新出发,我们提出了可塑性与稳定性的理想条件。基于损失函数中的泰勒展开,我们发现最优平衡权重可通过梯度和学习参数自动确定。因此,我们提出了一种稳定-可塑性平衡系数以避免知识干扰。根据指令的语义相似性,我们可以决定是重新训练还是扩展训练参数,并为测试实例分配最合适的参数。在多个持续指令微调基准上的大量实验表明,我们的方法不仅增强了抗遗忘能力,还显著提升了整体持续微调性能。代码已开源:https://github.com/JingyangQiao/CoIN。