Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices, initiating an extensive line of research in accurately and privately releasing statistics of a database. An increasing number of such databases consist of data from multiple sources, not all of which can be trusted. This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data. Despite the significance of designing algorithms that guarantee privacy and robustness (to a fraction of data being corrupted) simultaneously, even the simplest questions remain open. For the canonical problem of estimating the mean from i.i.d. samples, we introduce the first efficient algorithm that achieves both privacy and robustness for a wide range of distributions. This achieves optimal accuracy matching the known lower bounds for robustness, but the sample complexity has a factor of $d^{1/2}$ gap from known lower bounds. We further show that this gap is due to the computational efficiency; we introduce the first family of algorithms that close this gap but takes exponential time. The innovation is in exploiting resilience (a key property in robust estimation) to adaptively bound the sensitivity and improve privacy.
翻译:在从美国人口普查到商业设备中收集的数据的各种应用中,不同隐私已成为一项标准要求,从从美国人口普查到商业设备中收集的数据,开始对数据库准确和私下发布的统计数据进行广泛的研究。越来越多的这类数据库由多种来源的数据组成,并非所有数据都可信任。这使得现有的私人分析很容易受到一个输入腐败数据的对手的攻击。尽管设计算法,保证隐私和稳健性(数据的一部分被腐蚀)很重要,但即使是最简单的问题也仍未解决。对于从i.d.样本中估算平均值的典型问题,我们引入了第一个高效的算法,既能实现隐私,又能为广泛的分布实现稳健性。这实现了最佳的准确性,与已知的更低范围相匹配,但抽样的复杂性有一个因子,即与已知的较低界限相差1/2美元。我们进一步表明,这一差距是计算效率造成的;我们引入了第一组缩小这一差距但需要指数化时间的算法。创新是在利用复原力(一种稳健估算中的关键财产)来提高适应性的敏感性。