Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter the model and need to be removed afterwards. Recently, different concepts for machine unlearning have been proposed to address this problem. While these approaches are effective in removing individual data points, they do not scale to scenarios where larger groups of features and labels need to be reverted. In this paper, we propose the first method for unlearning features and labels. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters. It enables to adapt the influence of training data on a learning model retrospectively, thereby correcting data leaks and privacy issues. For learning models with strongly convex loss functions, our method provides certified unlearning with theoretical guarantees. For models with non-convex losses, we empirically show that unlearning features and labels is effective and significantly faster than other strategies.
翻译:从机器学习模式中删除信息是一项非三重任务,需要部分恢复培训过程。当敏感数据,如信用卡号码或密码,不小心进入模式,然后需要删除时,这项任务是不可避免的。最近,提出了不同的机器不学习概念来解决这个问题。虽然这些方法在删除单个数据点方面是有效的,但是它们不至于适用于需要恢复较大组合特征和标签的假设情况。在本文件中,我们提出了第一种不学习特征和标签的方法。我们的方法建立在影响功能的概念基础上,通过模型参数的封闭式更新实现不学习。它能够调整培训数据对学习模型的影响力,从而追溯性地纠正数据泄漏和隐私问题。对于具有强烈连接损失功能的学习模型,我们的方法提供了经认证的理论保证。对于有非convex损失的模型,我们的经验显示,不学习特征和标签是有效的,比其他战略要快得多。