Today, computer systems hold large amounts of personal data. Yet while such an abundance of data allows breakthroughs in artificial intelligence, and especially machine learning (ML), its existence can be a threat to user privacy, and it can weaken the bonds of trust between humans and AI. Recent regulations now require that, on request, private information about a user must be removed from both computer systems and from ML models, i.e. ``the right to be forgotten''). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often `remember' the old data. Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to completely solve the problem due to the lack of common frameworks and resources. Therefore, this paper aspires to present a comprehensive examination of machine unlearning's concepts, scenarios, methods, and applications. Specifically, as a category collection of cutting-edge studies, the intention behind this article is to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine unlearning and its formulations, design criteria, removal requests, algorithms, and applications. In addition, we aim to highlight the key findings, current trends, and new research areas that have not yet featured the use of machine unlearning but could benefit greatly from it. We hope this survey serves as a valuable resource for ML researchers and those seeking to innovate privacy technologies. Our resources are publicly available at https://github.com/tamlhp/awesome-machine-unlearning.
翻译:今天,计算机系统掌握着大量的个人数据。然而,虽然如此丰富的数据允许人工智能,特别是机器学习(ML)方面的突破,但这种数据的存在可能威胁到用户隐私,并会削弱人类和AI之间的信任纽带。 最新法规现在要求,根据要求,必须从计算机系统和ML模型中删除关于用户的私人信息,即“被遗忘的权利”);虽然从后端数据库中删除数据应当简单明了,但在AI范围内,数据并不足够,因为ML模型往往“铭记”旧数据。当代对受过训练的模型的对抗性攻击已经证明,我们可以了解一个实例或属性是否属于培训数据。 这种现象需要一种新的模式,即机器不学习模式必须同时从计算机系统和ML模型中去除。 事实证明,由于缺少共同框架和资源,最近关于机器不学习的著作未能完全解决问题。 因此,本文希望对机器调查的有价值的概念、情景、方法和应用进行全面审查是不够的。