Since the recent advent of regulations for data protection (e.g., the General Data Protection Regulation), there has been increasing demand in deleting information learned from sensitive data in pre-trained models without retraining from scratch. The inherent vulnerability of neural networks towards adversarial attacks and unfairness also calls for a robust method to remove or correct information in an instance-wise fashion, while retaining the predictive performance across remaining data. To this end, we define instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model, by either misclassifying each instance away from its original prediction or relabeling the instance to a different label. We also propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information. Both methods only require the pre-trained model and data instances to forget, allowing painless application to real-life settings where the entire training set is unavailable. Through extensive experimentation on various image classification benchmarks, we show that our approach effectively preserves knowledge of remaining data while unlearning given instances in both single-task and continual unlearning scenarios.
翻译:自数据保护条例(例如《一般数据保护条例》)最近出台以来,对删除在未经再培训的模型中从敏感数据中从未经再培训的模型中从敏感数据中汲取的信息的需求不断增加,神经网络的内在脆弱性是对抗性攻击和不公平的,这也要求采取强有力的方法,以实例方式删除或纠正信息,同时保留剩余数据的预测性能。为此,我们定义了实例学不学习,目的是将一组实例的信息从预先培训的模型中删除,办法是将每个实例从最初的预测中错误分类,或将实例重新标记为不同标签。我们还提出了两种方法,可以减少对剩余数据的忘却:1)利用对抗性实例,以克服在代表性层面的忘却,2)利用权重指标,确定网络参数,即传播不需要的信息。两种方法只需要经过事先培训的模型和数据实例来遗忘,允许将无痛应用到无法获得全部培训的实际情况环境中。通过对各种图像分类基准进行广泛的实验,我们展示了我们的方法有效地保留了剩余数据的知识,同时不连续学习单数假设情景。