In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures -- arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms.
翻译:在大规模分布式学习中,安全问题已变得日益重要。特别是在分散化环境中,一些计算单位可能表现异常,甚至表现出拜占庭失败 -- -- 任意的和潜在的对抗行为。在本文中,我们开发了分布式学习算法,这些算法对于这种失败是可察觉到的,能够防止这种失败,重点是取得最佳的统计性能。这项工作的主要结果是根据中位和刻度平均操作分别对两种稳健分布式梯度下降算法进行尖锐分析。在三种人口损失函数中,我们证明了统计误差率:强烈的二次曲线、非强烈的二次曲线和平稳的非曲线。特别是,这些算法显示这些算法能够达到最优的顺序统计误差率,以弥补很强的曲线损失。为了提高通信效率,我们进一步建议一种基于中位分布式的分布式算法,这种算法是稳健的稳健的,只使用一轮通信。关于强烈的 convex二次损失,我们证明这种算法达到与稳健分布式的梯度梯度下降法相同的最佳误率。