Clustering has been extensively studied in centralized settings, but relatively unexplored in federated ones that data are distributed among multiple clients and can only be kept local at the clients. The necessity to invest more resources in improving federated clustering methods is twofold: 1) The performance of supervised federated learning models can benefit from clustering. 2) It is non-trivial to extend centralized ones to perform federated clustering tasks. In centralized settings, various deep clustering methods that perform dimensionality reduction and clustering jointly have achieved great success. To obtain high-quality cluster information, it is natural but non-trivial to extend these methods to federated settings. For this purpose, we propose a simple but effective federated deep clustering method. It requires only one communication round between the central server and clients, can run asynchronously, and can handle device failures. Moreover, although most studies have highlighted adverse effects of the non-independent and identically distributed (non-IID) data across clients, experimental results indicate that the proposed method can significantly benefit from this scenario.
翻译:在中央集成环境中,对集群进行了广泛的研究,但相对而言,在联合集成环境中没有探讨数据在多个客户之间分布并且只能保持本地化的问题。在改进联合集成方法方面投入更多资源的必要性有两个方面:(1) 受监督的联合学习模式的性能可以从集群中获益。(2) 将中央集成模式扩大到执行联合集成任务是非边际的。在中央集成环境中,各种进行维度减少和联合集成的深层集成方法取得了巨大成功。为了获得高质量的集成信息,将这些方法推广到联合集成环境是自然的,但非三角的。为此目的,我们提出一种简单而有效的联合集成方法。这只需要中央服务器和客户之间进行一轮交流,可以不同步地运行,并且能够处理设备故障。此外,尽管大多数研究都强调了不依赖和同样分布的客户(非IID)数据的不利影响,但实验结果表明,拟议的方法可以从这一假设中大大受益。