Decentralized learning has gained great popularity to improve learning efficiency and preserve data privacy. Each computing node makes equal contribution to collaboratively learn a Deep Learning model. The elimination of centralized Parameter Servers (PS) can effectively address many issues such as privacy, performance bottleneck and single-point-failure. However, how to achieve Byzantine Fault Tolerance in decentralized learning systems is rarely explored, although this problem has been extensively studied in centralized systems. In this paper, we present an in-depth study towards the Byzantine resilience of decentralized learning systems with two contributions. First, from the adversarial perspective, we theoretically illustrate that Byzantine attacks are more dangerous and feasible in decentralized learning systems: even one malicious participant can arbitrarily alter the models of other participants by sending carefully crafted updates to its neighbors. Second, from the defense perspective, we propose UBAR, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance. Specifically, UBAR provides a Uniform Byzantine-resilient Aggregation Rule for benign nodes to select the useful parameter updates and filter out the malicious ones in each training iteration. It guarantees that each benign node in a decentralized system can train a correct model under very strong Byzantine attacks with an arbitrary number of faulty nodes. We conduct extensive experiments on standard image classification tasks and the results indicate that UBAR can effectively defeat both simple and sophisticated Byzantine attacks with higher performance efficiency than existing solutions.
翻译:在提高学习效率和保护数据隐私方面,分散化的学习获得非常受欢迎。每个计算节点都对协作学习深学习模式做出同等贡献。消除集中的参数服务器(PS)可以有效解决隐私、性能瓶颈和单点故障等许多问题。然而,如何在分散化的学习系统中实现拜占廷断层容忍却很少被探索,尽管这个问题已经在集中化的系统中进行了广泛的研究。在本文件中,我们用两种贡献对分散化的学习系统的拜占庭弹性规则进行了深入研究。首先,从对抗的角度,我们理论上表明,在分散化的学习系统中,拜占庭袭击更加危险和可行:即使是一个恶意的参与者也可以任意改变其他参与者的模式,向其邻居发送精心设计的最新信息。第二,从国防角度,我们建议采用乌占庭断层(UBAR)的新算法,以加强与Byzantine Fault容忍的分散化学习。具体来说,UBAR提供一种统一的Byzant-redient Agregistration 规则,用于选择有用的参数更新,并过滤每一起在分散化式攻击中的恶性模型。通过分级(Bechnical),它保证每个不透明化的操作都能够纠正。