Unlearning the data observed during the training of a machine learning (ML) model is an important task that can play a pivotal role in fortifying the privacy and security of ML-based applications. This paper raises the following questions: (i) can we unlearn a class/classes of data from a ML model without looking at the full training data even once? (ii) can we make the process of unlearning fast and scalable to large datasets, and generalize it to different deep networks? We introduce a novel machine unlearning framework with error-maximizing noise generation and impair-repair based weight manipulation that offers an efficient solution to the above questions. An error-maximizing noise matrix is learned for the class to be unlearned using the original model. The noise matrix is used to manipulate the model weights to unlearn the targeted class of data. We introduce impair and repair steps for a controlled manipulation of the network weights. In the impair step, the noise matrix along with a very high learning rate is used to induce sharp unlearning in the model. Thereafter, the repair step is used to regain the overall performance. With very few update steps, we show excellent unlearning while substantially retaining the overall model accuracy. Unlearning multiple classes requires a similar number of update steps as for the single class, making our approach scalable to large problems. Our method is quite efficient in comparison to the existing methods, works for multi-class unlearning, doesn't put any constraints on the original optimization mechanism or network design, and works well in both small and large-scale vision tasks. This work is an important step towards fast and easy implementation of unlearning in deep networks. We will make the source code publicly available.
翻译:取消在机器学习模式(ML)培训期间观察到的数据 。 在加强基于 ML 的应用程序的隐私和安全方面,我们引入了一个新的机器不学习框架,这种框架可以发挥关键作用,加强基于 ML 的应用程序的隐私和安全。本文提出以下问题:(一) 我们能否在不看完整培训数据的情况下,从一个 ML 模型中解开一个类/类数据,而不看完整的培训数据? (二) 我们能否使快速不学习和可扩缩到大型数据集,并将其推广到不同的深网络? 我们引入了一个新型的机器不学习框架,其中含有错误最大化的噪音生成和基于失修的重量操纵,为上述问题的高效解决方案提供了有效的解决方案。 将一个错误最大化的噪音矩阵用于利用原始模型的解决方案。 将一个大比例化的噪音矩阵用于利用原始模型的快速设计。 将一个大比例的噪音矩阵用于恢复整个网络的学习。 将一个非常优的系统用于更新, 将一个非常简单的系统, 将一个非常简单的系统用于学习模式的系统。