Modern privacy regulations grant citizens the right to be forgotten by products, services and companies. In case of machine learning (ML) applications, this necessitates deletion of data not only from storage archives but also from ML models. Due to an increasing need for regulatory compliance required for ML applications, machine unlearning is becoming an emerging research problem. The right to be forgotten requests come in the form of removal of a certain set or class of data from the already trained ML model. Practical considerations preclude retraining of the model from scratch minus the deleted data. The few existing studies use either the whole training data, or a subset of training data, or some metadata stored during training to update the model weights for unlearning. However, strict regulatory compliance requires time-bound deletion of data. Thus, in many cases, no data related to the training process or training samples may be accessible even for the unlearning purpose. We therefore ask the question: is it possible to achieve unlearning with zero training samples? In this paper, we introduce the novel problem of zero-shot machine unlearning that caters for the extreme but practical scenario where zero original data samples are available for use. We then propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer. These methods remove the information of the forget data from the model while maintaining the model efficacy on the retain data. The zero-shot approach offers good protection against the model inversion attacks and membership inference attacks. We introduce a new evaluation metric, Anamnesis Index (AIN) to effectively measure the quality of the unlearning method. The experiments show promising results for unlearning in deep learning models on benchmark vision data-sets.
翻译:现代隐私条例赋予公民被产品、服务和公司遗忘的权利。在机器学习(ML)应用中,如果机器学习(ML)应用中的数据不仅需要从存储档案中删除数据,而且需要从ML模型中删除数据。由于对ML应用中要求的监管合规需求日益增加,机器不学习正在成为一个新出现的研究问题。被遗忘要求的形式是删除已经受过培训的ML模型中的某些数据集或一类数据。实际考虑将模型的再培训从零到被删除的数据排除在外。在现有研究中,很少使用整个培训数据,或一组培训数据,或培训期间储存的一些元数据,以更新模型的重量。然而,严格监管要求有时限地删除数据。因此,在许多情况下,即使出于未学习的目的,也无法获得与培训过程或培训样本相关的数据。因此,我们问:能否用零培训样本实现不学习?在本文件中,我们介绍了新的零点机器不学习问题,这是为了适应极端但实际的情景,即有原始数据样本用于学习。我们随后提出了两种新颖的方法,即从零级标准转移数据。我们从零级标准中学习了标准中学习新的数据。