Removing the influence of a specified subset of training data from a machine learning model may be required to address issues such as privacy, fairness, and data quality. Retraining the model from scratch on the remaining data after removal of the subset is an effective but often infeasible option, due to its computational expense. The past few years have therefore seen several novel approaches towards efficient removal, forming the field of "machine unlearning", however, many aspects of the literature published thus far are disparate and lack consensus. In this paper, we summarise and compare seven state-of-the-art machine unlearning algorithms, consolidate definitions of core concepts used in the field, reconcile different approaches for evaluating algorithms, and discuss issues related to applying machine unlearning in practice.
翻译:为了解决隐私、公平和数据质量等问题,可能需要从机器学习模式中消除特定的培训数据子集的影响。在清除子集后,将模型从零到零地重新培训到剩余数据上是一项有效但往往不可行的选择,因为其计算费用。因此,过去几年中出现了若干实现有效消除的新办法,形成“机械脱学”的领域,但迄今为止出版的文献的许多方面是不同的,缺乏共识。在本文件中,我们总结并比较了7种最先进的机器脱学算法,合并了实地使用的核心概念的定义,调和了不同的算法评估方法,并讨论了与在实践中应用机器脱学有关的问题。