联邦地方SGD在2f年期以下的拜占庭违约容忍度 (Byzantine Fault-Tolerance in Federated Local SGD under 2f-Redundancy)

We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some agents ($f$ out of $N$) are Byzantine faulty. Such agents need not follow a prescribed algorithm correctly, and may communicate arbitrary incorrect information to the coordinator. In the presence of Byzantine agents, a more reasonable goal for the non-faulty agents is to find a minimizer of the aggregate cost function of only the non-faulty agents. This particular goal is commonly referred as exact fault-tolerance. Recent work has shown that exact fault-tolerance is achievable if only if the non-faulty agents satisfy the property of $2f$-redundancy. Now, under this property, techniques are known to impart exact fault-tolerance to the distributed implementation of the classical stochastic gradient-descent (SGD) algorithm. However, we do not know of any such techniques for the federated local SGD algorithm - a more commonly used method for federated machine learning. To address this issue, we propose a novel technique named comparative elimination (CE). We show that, under $2f$-redundancy, the federated local SGD algorithm with CE can indeed obtain exact fault-tolerance in the deterministic setting when the non-faulty agents can accurately compute gradients of their local cost functions. In the general stochastic case, when agents can only compute unbiased noisy estimates of their local gradients, our algorithm achieves approximate fault-tolerance with approximation error proportional to the variance of stochastic gradients and the fraction of Byzantine agents.

翻译：我们考虑的是Byzantine在联盟式机器学习中的过错容忍度问题。在这个问题中, 系统由多个具有本地数据的代理人和一个可信赖的中央协调员组成。在无过失环境下, 代理人与协调员合作, 寻找一个最小化的当地成本功能总和比当地数据定义的最小化。我们考虑的情景是, 一些代理人( $$中的美元) 是拜占庭特因错误。这些代理人不需要正确遵循规定的算法, 并且可能向协调员传递任意错误的信息。在有 Byzantine 代理的情况下, 一个更合理的非过失代理人的目标是找到一个最小化的只有非过失代理人的总成本功能。然而, 这个特定的目标通常被称为准确的过错容忍度。我们最近的工作表明, 只有当非过失代理人满足了2美元的耗损性财产。现在, 在这种属性下, 技术可以对分布式的C- 梯度梯度(SGD) 变异性变异性变异性变异性变异性( SGD) 的变异性算法中, 我们更了解了这些地方变异性变异性变变变的货币方法。