The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already trained systems. While several sample erasure methods have been proposed, all of them have drawbacks which have prevented them from gaining widespread adoption. Most methods are either only applicable to very specific families of models, sacrifice too much of the original model's accuracy, or they have prohibitive memory or computational requirements. In this paper, we propose an efficient and effective algorithm, SSSE, for samples erasure, that is applicable to a wide class of machine learning models. From a second-order analysis of the model's loss landscape we derive a closed-form update step of the model parameters that only requires access to the data to be erased, not to the original training set. Experiments on three datasets, CelebFaces attributes (CelebA), Animals with Attributes 2 (AwA2) and CIFAR10, show that in certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
翻译:大量用户提供的数据的可得性一直是许多现实世界任务机器学习成功的关键。最近,人们日益认识到,用户应更多地控制如何使用其数据。特别是,用户应有权禁止使用其数据来培训机器学习系统,并将数据从经过培训的系统中抹去。虽然提出了几种抽样消蚀方法,但所有这些方法都有一些缺点,使它们无法广泛采用。大多数方法要么只适用于非常具体的模型家庭,牺牲了原模型的准确性,或者它们有令人望而却步的记忆或计算要求。在本文件中,我们建议一种高效有效的算法,即SSSSESE,用于样本消化,适用于广泛的机器学习模式。从对模型损失情况进行的第二顺序分析中,我们得出了模型参数的封闭式更新步骤,这些只是需要获得数据才能被删除,而不是原始培训集。在三个数据集、Ceeb Faces属性(CelebA)、具有令人望望望的记忆或计算要求的动物(SSSSE),只有具有属性2(AwA2)和CIRA10标准的动物,它们才能从最佳的样本中找到某种最佳的、最不切实际的数据。