The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines' data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. Surprisingly, we show that the cost of this trade-off is marginal compared to that of the classical privacy-utility trade-off. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.
翻译:在敏感公共领域应用中,分布式机器学习(ML)无处不在,这要求采用保护数据隐私的算法,同时对缺陷和对抗行为保持稳健。虽然在分布式ML中,隐私和稳健性得到了广泛的独立研究,但其合成仍然不甚为人理解。我们首次对确保稳健的算法对一小部分对抗敌对机器以及诚实机器数据对任何其他好奇实体的不同隐私(DP)造成的错误进行了严密分析。我们的分析表明,在隐私、稳健性和实用性之间有着根本的权衡。奇怪的是,我们表明,与传统的隐私-可利用性交易相比,这种交易的成本微不足道。为了证明我们较低的约束,我们考虑了中值估算案例,但取决于分布式的DP和稳健性约束,并设计了单向边缘集中估算的缩减。我们通过使用高维力的聚合规则展示新的分布式ML算法来证明我们的匹配上限。后一种对误差依赖(由敌对式工人和DP造成的),同时对数据的统计属性持怀疑。