Users have the right to have their data deleted by third-party learned systems, as codified by recent legislation such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Such data deletion can be accomplished by full re-training, but this incurs a high computational cost for modern machine learning models. To avoid this cost, many approximate data deletion methods have been developed for supervised learning. Unsupervised learning, in contrast, remains largely an open problem when it comes to (approximate or exact) efficient data deletion. In this paper, we propose a density-ratio-based framework for generative models. Using this framework, we introduce a fast method for approximate data deletion and a statistical test for estimating whether or not training points have been deleted. We provide theoretical guarantees under various learner assumptions and empirically demonstrate our methods across a variety of generative methods.
翻译:用户有权要求第三方学习的系统删除其数据,如《一般数据保护条例》和《加利福尼亚消费者隐私法》等最新立法编纂的第三方学习系统有权将其数据删除,这种数据删除可以通过全面再培训完成,但现代机器学习模式的计算成本很高。为了避免这一成本,已经为监督学习制定了许多大致的数据删除方法。相反,在(近似或准确)有效数据删除方面,未经监督的学习在很大程度上仍然是一个尚未解决的问题。在本文中,我们提出了基于密度的基因模型框架。我们采用这一框架,采用了一种快速方法,大致删除数据,并用统计测试来估计是否删除了培训点。我们在各种学习者假设下提供了理论保障,并用经验展示了我们各种基因化方法的方法。