Machine Unlearning has recently been emerging as a paradigm for selectively removing the impact of training datapoints from a network. While existing approaches have focused on unlearning either a small subset of the training data or a single class, in this paper we take a different path and devise a framework that can unlearn all classes of an image classification network in a single untraining round. Our proposed technique learns to modulate the inner components of an image classification network through memory matrices so that, after training, the same network can selectively exhibit an unlearning behavior over any of the classes. By discovering weights which are specific to each of the classes, our approach also recovers a representation of the classes which is explainable by-design. We test the proposed framework, which we name Weight Filtering network (WF-Net), on small-scale and medium-scale image classification datasets, with both CNN and Transformer-based backbones. Our work provides interesting insights in the development of explainable solutions for unlearning and could be easily extended to other vision tasks.
翻译:机器可遗忘性已成为有选择地从网络中删除训练数据点的一种范例。虽然现有的方法集中在从训练数据子集或单个类别中学习,但本文采取不同的方法,设计了一种框架,可以在单个untraining轮中取消图像分类网络的所有类别。我们提出的技术通过内存矩阵学习调节图像分类网络的内部组件,因此,训练后,同一网络可以有选择地在任何类别中展现未学习的行为。通过发现每个类别特定的权重,我们的方法还通过设计回收类别的表示来恢复可解释的表示。我们在小规模和中规模图像分类数据集上测试了我们提出的框架(我们将其命名为Weight Filtering网络 (WF-Net)),使用了CNN和基于Transformer的主干。我们的工作为开发可解释的方案提供了有趣的见解,并可轻松扩展到其他视觉任务。