For mitigating Byzantine behaviors in federated learning (FL), most state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of updates from the benign clients. However, in many practical FL scenarios, data is non-IID across clients, thus the updates received from even the benign clients are quite dissimilar, resulting in poor convergence performance of such similarity based methods. As our main contribution, we propose \textit{DiverseFL} to overcome this challenge in heterogeneous data distribution settings. Particularly, the FL server in DiverseFL computes a \textit{guiding} gradient in every iteration for each client over a small sample of the client's local data that is received only once before start of the training. The server then utilizes a novel \textit{per client} criteria for flagging Byzantine updates, by comparing the corresponding guiding gradient with the client's update, and updates the model using the gradients received from the non-flagged clients. This overcomes the shortcoming of similarity based approaches since the flagging of a client is based on whether its update matches what is expected from its verified sample data (not its similarity to performance of others). As we demonstrate through our experiments involving neural networks, benchmark datasets and popular Byzantine attacks, including a strong backdoor attack for non-IID data, DiverseFL not only performs Byzantine mitigation quite effectively, it \textit{almost matches the performance of \textit{Oracle SGD}}, where the server knows the identities of the Byzantine clients.
翻译:为了减轻联邦学习(FL)中的拜占庭行为(FL),大多数最先进的方法(如Bulyan)都倾向于利用来自友好客户的最新消息的相似性。然而,在许多实用的FL假想中,数据是客户之间非IID的,因此,即使从良客户收到的最新消息也非常不同,导致类似方法的趋同性能差。作为我们的主要贡献,我们提议在多种数据分配设置中,在不同的数据分配设置中,克服这一挑战。特别是,在Explexelelefll 中的FL服务器对每个客户的每个版本都进行与刚开始培训前只收到过一次的客户本地数据样本的相似性能计算。服务器随后使用新的\ textit{per clior} 标准向Byzantine更新这种基于类似指导性能的梯度与客户更新, 并使用非滞后客户的梯度更新模型。这克服了自其旗杆攻击以来基于类似性能的类似性能方法, 包括Slickral的客户的性能测试数据。