It is often desirable to remove (a.k.a. unlearn) a specific part of the training data from a trained neural network model. A typical application scenario is to protect the data holder's right to be forgotten, which has been promoted by many recent regulation rules. Existing unlearning methods involve training alternative models with remaining data, which may be costly and challenging to verify from the data holder or a thirdparty auditor's perspective. In this work, we provide a new angle and propose a novel unlearning approach by imposing carefully crafted "patch" on the original neural network to achieve targeted "forgetting" of the requested data to delete. Specifically, inspired by the research line of neural network repair, we propose to strategically seek a lightweight minimum "patch" for unlearning a given data point with certifiable guarantee. Furthermore, to unlearn a considerable amount of data points (or an entire class), we propose to iteratively select a small subset of representative data points to unlearn, which achieves the effect of unlearning the whole set. Extensive experiments on multiple categorical datasets demonstrates our approach's effectiveness, achieving measurable unlearning while preserving the model's performance and being competitive in efficiency and memory consumption compared to various baseline methods.
翻译:从训练好的神经网络模型中移除(即遗忘)特定训练数据通常是可取的。典型应用场景是保护数据持有者的被遗忘权,这一权利已被许多近期法规所推广。现有遗忘方法涉及使用剩余数据重新训练替代模型,从数据持有者或第三方审计者角度来看,这可能成本高昂且难以验证。本文提供了一个新视角,提出一种新颖的遗忘方法:通过对原始神经网络施加精心设计的“补丁”,实现对指定待删除数据的有针对性“遗忘”。具体而言,受神经网络修复研究路线的启发,我们提出策略性地寻找轻量级最小“补丁”,以可验证保证的方式遗忘给定数据点。此外,为遗忘大量数据点(或整个类别),我们提出迭代选择代表性数据点子集进行遗忘,从而实现遗忘整个集合的效果。在多个分类数据集上的大量实验证明了我们方法的有效性,在保持模型性能的同时实现了可量化的遗忘,并且在效率与内存消耗方面与多种基线方法相比具有竞争力。