Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em $\beta$-Stochastic Sign SGD ($\beta$-StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the $\ell_{\infty}$ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} + 1/\sqrt{M})$, where $M$ is the number of clients. To the best of our knowledge, when $M$ is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is $O(T^{-1/4})$. Under bounded gradient assumption, $\beta$-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets.
翻译:联邦学习联合会(FL)是一个新生的分散学习框架,在此框架下,大量不同客户在不披露本地数据的情况下合作培训模型。 疏漏通信、隐私泄漏和拜占庭袭击是系统可缩缩缩的关键瓶颈。 在本文中,我们侧重于通信高效分布(随机)梯度下降,用于非cavex优化,这是FL的驱动力。 我们建议使用两个算法,名为 & em- 适应性信号 SGD (Ada-StoSign) 和 $@beta$- stochest 信号SGD (\beeta$- StoStoSign) 模型。 每个都将本地梯度缩缩缩成比矢量。 要处理无限制的梯度, Ada-StoSign使用一个新的规范跟踪功能, 调整对本地梯度$@ellivtell 的粗估估估值, 我们的缩略缩缩略缩缩缩缩缩缩略图, 也就是我们以美元计为美元的货币的基数。