We consider a foundational unsupervised learning task of $k$-means data clustering, in a federated learning (FL) setting consisting of a central server and many distributed clients. We develop SecFC, which is a secure federated clustering algorithm that simultaneously achieves 1) universal performance: no performance loss compared with clustering over centralized data, regardless of data distribution across clients; 2) data privacy: each client's private data and the cluster centers are not leaked to other clients and the server. In SecFC, the clients perform Lagrange encoding on their local data and share the coded data in an information-theoretically private manner; then leveraging the algebraic structure of the coding, the FL network exactly executes the Lloyd's $k$-means heuristic over the coded data to obtain the final clustering. Experiment results on synthetic and real datasets demonstrate the universally superior performance of SecFC for different data distributions across clients, and its computational practicality for various combinations of system parameters. Finally, we propose an extension of SecFC to further provide membership privacy for all data points.
翻译:我们认为,在一个由中央服务器和许多分布客户组成的联合学习(FL)环境中,一个基本而不受监督的数据集群是一个不受监督的基本学习任务,即以美元为单位的数据集群。我们开发了SecFC,这是一个安全的联合群集算法,可以同时实现1个普遍性能:1:与集中数据集群相比,不管客户之间数据分布如何,业绩不会受损;2数据隐私:每个客户的私人数据和集束中心不会泄露给其他客户和服务器。在SecFC,客户对本地数据进行拉格朗编码,并以信息理论私人方式分享编码数据;然后利用编码的代数结构,FLED网络精确地执行劳埃德的美元-平均值超值数据,以获得最后的集集。合成和真实数据集的实验结果显示,SecFC在客户之间不同数据分布方面业绩普遍优异,以及各种系统参数组合的计算实用性。最后,我们提议扩大SecFC公司的范围,以进一步提供所有数据点的会员隐私。