" 机不学 " 调查 (A Survey of Machine Unlearning)

Computer systems hold a large amount of personal data over decades. On the one hand, such data abundance allows breakthroughs in artificial intelligence (AI), especially machine learning (ML) models. On the other hand, it can threaten the privacy of users and weaken the trust between humans and AI. Recent regulations require that private information about a user can be removed from computer systems in general and from ML models in particular upon request (e.g. the "right to be forgotten"). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often "remember" the old data. Existing adversarial attacks proved that we can learn private membership or attributes of the training data from the trained models. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to solve the problem completely due to the lack of common frameworks and resources. In this survey paper, we seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications. Specifically, as a categorical collection of state-of-the-art research, we hope to provide a broad reference for those seeking a primer on machine unlearning and its various formulations, design requirements, removal requests, algorithms, and uses in a variety of ML applications. Furthermore, we hope to outline key findings and trends in the paradigm as well as highlight new areas of research that have yet to see the application of machine unlearning, but could nonetheless benefit immensely. We hope this survey provides a valuable reference for ML researchers as well as those seeking to innovate privacy technologies. Our resources are at https://github.com/tamlhp/awesome-machine-unlearning.

翻译：几十年来,计算机系统拥有大量个人数据。一方面,这类数据丰富允许人工智能(AI)的突破(AI),特别是机器学习(ML)模型。另一方面,它可能威胁用户的隐私,削弱人类和AI之间的信任。最近的规定要求用户的私人信息可以从一般的计算机系统,特别是ML模型中移除(例如“被遗忘的权利”)。虽然从后端数据库中删除数据应该直截了当,但在AI背景下,数据丰富还不够,因为ML模型经常“记住”旧数据。现有的对抗性攻击证明我们可以从培训模型中学习培训数据的私人成员或属性。这种现象需要一个新的范例,即机器不学习,使ML模型忘记特定数据。结果显示,由于缺少共同框架和资源,最近关于机器不学习的工程未能彻底解决问题。在本调查文件中,我们试图彻底调查机器不学习的版本,在定义、情景、机制和应用中,我们也可以彻底地调查机器调查,但是,作为基础-数字-数学研究的模型,我们在搜索中,这些基础-数字-数学研究中,在寻求核心-设计-数学研究中的关键-数学研究中,我们并没有希望- 。