This paper addresses the problem of combining Byzantine resilience with privacy in machine learning (ML). Specifically, we study whether a distributed implementation of the renowned Stochastic Gradient Descent (SGD) learning algorithm is feasible with both differential privacy (DP) and Byzantine resilience. To the best of our knowledge, this is the first work to tackle this problem from a theoretical point of view. Intuitively, it should be straightforward to merge standard solutions for these two (seemingly) orthogonal issues. However, a key finding of our analyses is that classical approaches to Byzantine resilience and DP in ML are incompatible. More precisely, we show that a direct composition of these techniques makes the guarantees of the resulting SGD algorithm depend unfavourably upon the number of parameters in the ML model, making the training of large models practically infeasible. We validate our theoretical results through numerical experiments on publicly-available datasets; showing that it is impractical to simultaneously ensure DP and Byzantine resilience even for reasonable model sizes.
翻译:本文探讨了将拜占庭复原力与机器学习隐私相结合的问题。 具体地说,我们研究的是,在有差异的隐私(DP)和拜占庭复原力(Byzantine)的情况下,分散实施著名的Stochastic Gradient Fround(SGD)学习算法是否可行。 据我们所知,这是从理论角度解决这一问题的首份工作。 直观地说,将这两个(似乎)正统问题的标准解决方案合并起来应该是直截了当的。 然而,我们分析的一个重要发现是,在ML对拜占庭复原力和DP的典型方法不相容。 更准确地说,我们表明,这些技术的直接构成使得由此形成的SGD算法的保障不适宜于ML模型的参数数量,使得大型模型的培训实际上不可行。 我们通过在公开提供的数据集上进行数字实验来验证我们的理论结果; 表明,确保DP和Byzantine的适应能力,即使是在合理的模型大小上也是不切实际的。