Federated learning is an emerging paradigm that permits a large number of clients with heterogeneous data to coordinate learning of a unified global model without the need to share data amongst each other. Standard federated learning algorithms involve averaging of model parameters or gradient updates to approximate the global model at the server. However, in heterogeneous settings averaging can result in information loss and lead to poor generalization due to the bias induced by dominant clients. We hypothesize that to generalize better across non-i.i.d datasets as in FL settings, the algorithms should focus on learning the invariant mechanism that is constant while ignoring spurious mechanisms that differ across clients. Inspired from recent work in the Out-of-Distribution literature, we propose a gradient masked averaging approach for federated learning as an alternative to the standard averaging of client updates. This client update aggregation technique can be adapted as a drop-in replacement in most existing federated algorithms. We perform extensive experiments with gradient masked approach on multiple FL algorithms with in-distribution, real-world, and out-of-distribution (as the worst case scenario) test dataset and show that it provides consistent improvements, particularly in the case of heterogeneous clients.
翻译:联邦学习是一种新兴的范例,它允许大量拥有不同数据的客户协调学习统一的全球模型,而不必相互分享数据。标准联合学习算法涉及平均使用模型参数或梯度更新,以在服务器上接近全球模型。然而,在差异化环境中,平均可导致信息丢失,并导致因主要客户的偏向而导致的概括化不力。我们假设,可以将非一.一.d数据集与FL设置中的非一.i.d数据集相较,这些算法应侧重于学习常态的变异机制,同时忽视不同客户的虚假机制。根据最近在“分发外”文献中开展的工作,我们建议采用以梯度遮盖平均学习法,以替代标准客户平均更新的平均数。这种客户更新汇总技术可以调整为大多数现有federered算法的低位替代。我们用梯度遮盖法对多种FL算法进行了广泛的实验,在分配、真实世界和外部分配(作为最坏的假设)中,我们用梯度方法对多种FL算法进行了广泛的试验。我们特别在变异的情况下对客户进行了试验。