Byzantine Fault Tolerance (BFT) is one of the most challenging problems in Distributed Machine Learning (DML), defined as the resilience of a fault-tolerant system in the presence of malicious components. Byzantine failures are still difficult to deal with due to their unrestricted nature, which results in the possibility of generating arbitrary data. Significant research efforts are constantly being made to implement BFT in DML. Some recent studies have considered various BFT approaches in DML. However, some aspects are limited, such as the few approaches analyzed, and there is no classification of the techniques used in the studied approaches. In this paper, we present a survey of recent work surrounding BFT in DML, mainly in first-order optimization methods, especially Stochastic Gradient Descent(SGD). We highlight key techniques as well as fundamental approaches. We provide an illustrative description of the techniques used in BFT in DML, with a proposed classification of BFT approaches in the context of their fundamental techniques. This classification is established on specific criteria such as communication process, optimization method, and topology setting, which characterize future work methods addressing open challenge
翻译:Byzantine Fault Condition(BFT)是分布式机器学习(DML)中最具挑战性的问题之一,被界定为在恶意部件存在的情况下对容错系统的抗御能力。Byzantine的故障仍然难以处理,因为其具有不受限制的性质,导致产生任意性数据的可能性。正在不断作出重大研究努力,以便在DML实施BFT。一些最近的研究考虑了DML的各种BFT方法。然而,有些方面是有限的,例如所分析的方法不多,而且对研究方法中使用的技术没有分类。本文对DML中BFT的最近工作进行了调查,主要在一级优化方法中,特别是斯托卡特-梯层(SGD)中。我们强调关键技术和基本方法。我们举例说明了DML BFT在使用的技术,并提议在其基本技术范围内对BFT方法进行分类。这种分类是根据通信程序、优化方法和地形设置等具体标准确定的,这些具体标准是未来处理公开挑战的工作方法。我们着重说明DFTL使用的主要技术和基本方法。