We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used $\ell_1$, elastic net, or nuclear norm penalties. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models.
翻译:我们的研究问题是,如何从用经验风险最小化来培训的机器学习模型中删除用户数据。我们的重点是学习算法,这种算法将经验风险最小化和近似非学习算法退回到符合删除请求的流动微型小桶中。我们利用无穷无穷的黑片,开发了一种计算和记忆高效的在线不学习算法。与以前的记忆高效非学习算法不同,我们的目标模型将目标最小化于非摩擦的正规化者,如常用的$@ell_1$、弹性网或核规范处罚。我们还提供了与艺术方法状态相一致的通用化、删除能力和不学习保证。在各种基准数据集中,我们的算法在以往方法运行期间在经验上得到了改进,同时保持了相同的记忆要求和测试准确性。最后,我们打开了一个新的研究方向,通过证明所有近似未学习算法都无法在通常超常参数调法(如交叉校准)被用于选择模型的麻烦环境中脱去。