SSSE:高效地从经过训练的机械学习模型中去除样本 (SSSE: Efficiently Erasing Samples from Trained Machine Learning Models)

The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already trained systems. While several sample erasure methods have been proposed, all of them have drawbacks which have prevented them from gaining widespread adoption. Most methods are either only applicable to very specific families of models, sacrifice too much of the original model's accuracy, or they have prohibitive memory or computational requirements. In this paper, we propose an efficient and effective algorithm, SSSE, for samples erasure, that is applicable to a wide class of machine learning models. From a second-order analysis of the model's loss landscape we derive a closed-form update step of the model parameters that only requires access to the data to be erased, not to the original training set. Experiments on three datasets, CelebFaces attributes (CelebA), Animals with Attributes 2 (AwA2) and CIFAR10, show that in certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.

翻译：大量用户提供的数据的可得性一直是许多现实世界任务机器学习成功的关键。最近,人们日益认识到,用户应更多地控制如何使用其数据。特别是,用户应有权禁止使用其数据来培训机器学习系统,并将数据从经过培训的系统中抹去。虽然提出了几种抽样消蚀方法,但所有这些方法都有一些缺点,使它们无法广泛采用。大多数方法要么只适用于非常具体的模型家庭,牺牲了原模型的准确性,或者它们有令人望而却步的记忆或计算要求。在本文件中,我们建议一种高效有效的算法,即SSSSESE,用于样本消化,适用于广泛的机器学习模式。从对模型损失情况进行的第二顺序分析中,我们得出了模型参数的封闭式更新步骤,这些只是需要获得数据才能被删除,而不是原始培训集。在三个数据集、Ceeb Faces属性(CelebA)、具有令人望望望的记忆或计算要求的动物(SSSSE),只有具有属性2(AwA2)和CIRA10标准的动物,它们才能从最佳的样本中找到某种最佳的、最不切实际的数据。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/