Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it, putting users at risk of a successful privacy attack exposing their information. Yet, having models unlearn is notoriously difficult. We introduce SISA training, a framework that expedites the unlearning process by strategically limiting the influence of a data point in the training procedure. While our framework is applicable to any learning algorithm, it is designed to achieve the largest improvements for stateful algorithms like stochastic gradient descent for deep neural networks. SISA training reduces the computational overhead associated with unlearning, even in the worst-case setting where unlearning requests are made uniformly across the training set. In some cases, the service provider may have a prior on the distribution of unlearning requests that will be issued by users. We may take this prior into account to partition and order data accordingly, and further decrease overhead from unlearning. Our evaluation spans several datasets from different domains, with corresponding motivations for unlearning. Under no distributional assumptions, for simple learning tasks, we observe that SISA training improves time to unlearn points from the Purchase dataset by 4.63x, and 2.45x for the SVHN dataset, over retraining from scratch. SISA training also provides a speed-up of 1.36x in retraining for complex learning tasks such as ImageNet classification; aided by transfer learning, this results in a small degradation in accuracy. Our work contributes to practical data governance in machine unlearning.
翻译:一旦用户在线共享了数据,他们通常很难取消访问,要求删除数据。机器学习(ML)加剧了这一问题,因为任何以上述数据培训的模型都可能已经记忆起来,使用户面临成功隐私攻击的风险,从而暴露他们的信息。然而,将模型拆开是臭名昭著的困难。我们引入SISA培训,这是一个在战略上限制一个数据点对培训程序的影响,从而加快不学习过程的框架。虽然我们的框架适用于任何复杂的学习算法,但设计这个框架是为了实现诸如深层神经神经网络的随机梯度梯度下降等状态性算法的最大改进。SISA培训减少了与不学习相关的计算间接费用,即使在最坏的环境下,不学习请求也会在整套培训中被统一。在某些情况下,服务提供商可能事先分配用户将发布的不学习请求。我们可能先考虑对数据进行分解和排序,然后进一步减少未学习的间接费用。我们的评估跨越了不同的领域,有相应的动机进行不学习。在SISA中,在SISA中,不做简单的学习SISA任务,通过S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-