In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness. Each participating individual should be able to contribute without the fear of leaking one's sensitive information. At the same time, the system should be robust in the presence of malicious participants inserting corrupted data. Recent algorithmic advances in learning from shared data focus on either one of these threats, leaving the system vulnerable to the other. We bridge this gap for the canonical problem of estimating the mean from i.i.d. samples. We introduce PRIME, which is the first efficient algorithm that achieves both privacy and robustness for a wide range of distributions. We further complement this result with a novel exponential time algorithm that improves the sample complexity of PRIME, achieving a near-optimal guarantee and matching a known lower bound for (non-robust) private mean estimation. This proves that there is no extra statistical cost to simultaneously guaranteeing privacy and robustness.
翻译:从共享数据中进行的统计学习和分析,在诸如联合学习和元学习等平台上日益广泛采用的数据中,有两个主要关切:隐私和稳健性。每个参与的个人都应能够在不必担心泄露敏感信息的情况下作出贡献。与此同时,当恶意参与者出现时,系统应稳健,插入腐败数据。从共享数据中学习的最新算法进步侧重于这些威胁之一,使系统易受到另一个威胁。我们弥补了从i.i.d.样本中估算平均值的这一荒谬问题。我们引入了PRIME,这是实现隐私和稳健性等广泛分布的第一个高效算法。我们进一步补充了这一结果,采用了新的指数时间算法,改进了PRIME的样本复杂性,实现了接近最佳的保证,并匹配了已知的(非罗布特)私人平均估算的较低界限。这证明没有额外的统计成本来同时保障隐私和稳健性。