Differential privacy offers formal quantitative guarantees for algorithms over datasets, but it assumes attackers that know and can influence all but one record in the database. This assumption often vastly overapproximates the attackers' actual strength, resulting in unnecessarily poor utility. Recent work has made significant steps towards privacy in the presence of partial background knowledge, which can model a realistic attacker's uncertainty. Prior work, however, has definitional problems for correlated data and does not precisely characterize the underlying attacker model. We propose a practical criterion to prevent problems due to correlations, and we show how to characterize attackers with limited influence or only partial background knowledge over the dataset. We use these foundations to analyze practical scenarios: we significantly improve known results about the privacy of counting queries under partial knowledge, and we show that thresholding can provide formal guarantees against such weak attackers, even with little entropy in the data. These results allow us to draw novel links between k-anonymity and differential privacy under partial knowledge. Finally, we prove composition results on differential privacy with partial knowledge, which quantifies the privacy leakage of complex mechanisms. Our work provides a basis for formally quantifying the privacy of many widely-used mechanisms, e.g. publishing the result of surveys, elections or referendums, and releasing usage statistics of online services.
翻译:隐私差异为数据集的算法提供了正式的量化保障,但它假定攻击者知道并能够影响数据库中除一个记录以外的所有记录。这种假设往往大大超出攻击者的实际实力,造成不必要效用的不良。最近的工作在部分背景知识的情况下为隐私权迈出了重大步骤,这种背景知识可以模拟现实攻击者的不确定性。然而,先前的工作对相关数据有定义问题,没有确切地描述攻击者模式。我们提议了一个实际标准,以防止因相关关系而产生的问题,我们展示了如何描述影响有限或仅对数据集只具有部分背景知识的攻击者的特点。我们利用这些基础分析实际情景:我们大大改进了在部分知识下计算询问的隐私方面的已知结果,并且我们表明,阈值可以为这种弱攻击者提供正式的保障,即使数据中几乎没有什么诱导力。这些结果使我们能够将k-匿名性与部分知识下的隐私差异联系起来。最后,我们用部分知识来证明隐私差异的构成结果,从而量化了复杂机制的隐私渗漏。我们的工作为正式量化在部分知识范围内进行计算的保密性、公布选举结果提供了基础。