Machine unlearning aims to remove unwanted information from a model, but many methods are inefficient for LLMs with large numbers of parameters or fail to fully remove the intended information without degrading performance on knowledge that should be retained. Model editing algorithms solve a similar problem of changing information in models, but they focus on redirecting inputs to a new target rather than removing that information altogether. In this work, we explore the editing algorithms ROME, IKE, and WISE and design new editing targets for an unlearning setting. Through this investigation, we show that model editing approaches can exceed baseline unlearning methods in terms of quality of forgetting depending on the setting. Like traditional unlearning techniques, they struggle to encapsulate the scope of what is to be unlearned without damage to the overall model performance.
翻译:机器遗忘学习旨在从模型中移除不需要的信息,但许多方法对于参数数量庞大的大语言模型效率低下,要么无法完全移除目标信息,要么在保留应掌握知识的同时导致模型性能下降。模型编辑算法解决了改变模型中信息的类似问题,但其重点在于将输入重定向至新目标,而非完全移除该信息。本研究探索了ROME、IKE和WISE三种编辑算法,并为遗忘学习场景设计了新的编辑目标。通过实验分析,我们证明根据具体场景,模型编辑方法在遗忘质量上可以超越基线遗忘方法。与传统遗忘技术类似,这些方法在界定待遗忘信息的范围时,仍难以避免对模型整体性能造成损害。