Federated clustering is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for federated clustering methods has become of significant importance. This work proposes the first known unlearning mechanism for federated clustering with privacy criteria that support simple, provable, and efficient data removal at the client and server level. The gist of our approach is to combine special initialization procedures with quantization methods that allow for secure aggregation of estimated local cluster counts at the server unit. As part of our platform, we introduce secure compressed multiset aggregation (SCMA), which is of independent interest for secure sparse model aggregation. In order to simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into the new SCMA pipeline and derive bounds on the time and communication complexity of different components of the scheme. Compared to completely retraining K-means++ locally and globally for each removal request, we obtain an average speed-up of roughly 84x across seven datasets, two of which contain biological and medical information that is subject to frequent unlearning requests.
翻译:联邦集群是一个未经监督的学习问题,它出现在一些实际应用中,包括个性化建议和保健系统。随着最近通过确保“被遗忘的权利”的法律的通过,联合集群方法的机器不学习的问题变得非常重要。这项工作提出了第一个已知的以隐私标准支持客户和服务器一级简单、可验证和有效的数据删除的联邦集群非学习机制。我们的方法的要点是将特殊初始化程序与量化方法结合起来,以便能够安全地汇总服务器机组估计的当地集群数。作为我们平台的一部分,我们引入安全的压缩多套组合(SCMA),这是安全稀薄模型集的独立利益。为了同时促进低通信复杂性和秘密共享协议,我们把Reed-Solomon编码与特殊评估点整合到新的SCMA管道和服务器一级,并从不同组成部分的时间和通信复杂性中获取界限。与对每个迁移请求的完全再培训K- means++相比,我们获得了7个数据集的平均速度约84x,其中2个含有生物和医学经常学习的要求。