Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning algorithms to solve this approximately as well as evaluation methods to test the effectiveness of these algorithms. In this work, we first outline some necessary criteria for evaluation methods and show no existing evaluation satisfies them all. Then, we design a stronger black-box evaluation method called the Interclass Confusion (IC) test which adversarially manipulates data during training to detect the insufficiency of unlearning procedures. We also propose two analytically motivated baseline methods~(EU-k and CF-k) which outperform several popular inexact unlearning methods. Overall, we demonstrate how adversarial evaluation strategies can help in analyzing various unlearning phenomena which can guide the development of stronger unlearning algorithms.
翻译:机器学习模型在个人用户数据的储存以及诸如后门或系统性偏差等腐败数据的不利影响方面日益受到关注。 机器不学习可以通过允许从一个已学模式中删除受影响的培训数据来解决这些问题。 完成这一任务在计算上是昂贵的; 因此,最近的工作提议了不完全的不学习算法来大致解决这个问题, 以及用来测试这些算法有效性的评价方法。 在这项工作中, 我们首先概述了一些必要的评价方法标准, 并且没有显示任何现有的评价都满足这些方法。 然后, 我们设计了一个更强大的黑盒评价方法, 称为“ 跨类融合(IC) 测试 ”, 这种方法在培训期间对数据进行对立操纵, 以检测不学习程序不足的情况。 我们还提出了两种具有分析动机的基线方法 ~ ( EU-k 和 CF-k), 这种方法超越了几种流行的不学习方法。 总之, 我们展示了对抗性评价战略如何帮助分析各种不学习现象, 从而指导更强的不学习算法的发展。