Adjusting the outdated knowledge of large language models (LLMs) after deployment remains a major challenge. This difficulty has spurred the development of knowledge editing, which seeks to accurately and efficiently modify a model's internal (parametric) knowledge without retraining it from scratch. However, existing methods suffer from two limitations. First, they depend on structured triplets that are misaligned with the free-text nature of LLM pretraining and fail to capture the nuanced relationships among facts. Second, they typically support one-time knowledge updates, with relatively limited research on the problem of sequential or lifelong editing. To address these gaps, we propose a new task, Lifelong Free-text Knowledge Editing (LF-Edit), which enables models to incorporate updates expressed in natural language and supports continual editing over time. Despite its promise, LF-Edit faces the dual challenge of integrating new knowledge while mitigating the forgetting of prior information. To foster research on this new task, we construct a large-scale benchmark, Multi-Rank Lifelong Free-text Editing Benchmark (MRLF-Bench), containing 16,835 free-text edit requests. We further design a cognitively inspired multi-rank evaluation framework encompassing four levels: memorization, understanding, constrained comprehension, and reasoning. To tackle the challenges inherent in LF-Edit, we introduce a novel approach named EvoEdit that enhances knowledge injection through Latent Perturbation Augmentation and preserves prior information via Knowledge-driven Parameter Fusion. Experimental results demonstrate that EvoEdit substantially outperforms existing knowledge editing methods on the proposed LF-Edit task.
翻译:调整大型语言模型(LLMs)部署后的过时知识仍是一项重大挑战。这一难题推动了知识编辑技术的发展,其目标是在无需从头重新训练模型的情况下,准确高效地修改模型内部(参数化)知识。然而,现有方法存在两个局限性。首先,它们依赖于结构化三元组,这与LLM预训练的自由文本性质不匹配,且未能捕捉事实间的细微关联。其次,它们通常仅支持一次性知识更新,针对序列化或终身编辑问题的研究相对有限。为填补这些空白,我们提出了一项新任务——终身自由文本知识编辑(LF-Edit),使模型能够整合以自然语言表达的更新,并支持随时间推移的持续编辑。尽管前景广阔,LF-Edit面临双重挑战:既要融入新知识,又需缓解对先前信息的遗忘。为促进这一新任务的研究,我们构建了一个大规模基准数据集——多层级终身自由文本编辑基准(MRLF-Bench),包含16,835条自由文本编辑请求。我们进一步设计了一个受认知启发的多层级评估框架,涵盖四个层次:记忆、理解、受限理解和推理。为应对LF-Edit固有的挑战,我们提出了一种名为EvoEdit的新方法,该方法通过潜在扰动增强强化知识注入,并借助知识驱动参数融合保留先验信息。实验结果表明,在提出的LF-Edit任务上,EvoEdit显著优于现有知识编辑方法。