This work addresses the problem of optimizing communications between server and clients in federated learning (FL). Current sampling approaches in FL are either biased, or non optimal in terms of server-clients communications and training stability. To overcome this issue, we introduce \textit{clustered sampling} for clients selection. We prove that clustered sampling leads to better clients representatitivity and to reduced variance of the clients stochastic aggregation weights in FL. Compatibly with our theory, we provide two different clustering approaches enabling clients aggregation based on 1) sample size, and 2) models similarity. Through a series of experiments in non-iid and unbalanced scenarios, we demonstrate that model aggregation through clustered sampling consistently leads to better training convergence and variability when compared to standard sampling approaches. Our approach does not require any additional operation on the clients side, and can be seamlessly integrated in standard FL implementations. Finally, clustered sampling is compatible with existing methods and technologies for privacy enhancement, and for communication reduction through model compression.
翻译:这项工作解决了在联合学习中优化服务器和客户之间通信的问题。目前FL的抽样方法在服务器-客户通信和培训稳定性方面有偏向,或非最佳。为了解决这一问题,我们为客户选择引入了\ textit{ 集群抽样。我们证明,集成抽样使客户具有更好的代表性,降低了FL客户中客户的随机聚合权重差异。与我们的理论一致,我们提供了两种不同的集群方法,使客户能够根据1个样本大小和2个类似模型进行汇总。通过一系列非二类和不平衡情景的实验,我们证明通过集成抽样集成,在与标准抽样方法相比,能够不断导致更好的培训趋同和可变性。我们的方法不需要在客户方面增加任何业务,而且可以在标准的FL实施中无缝地融入。最后,集成抽样与现有的增强隐私和通过模型压缩减少通信的方法和技术是兼容的。