保护数据集的全局属性：分布隐私机制 (Protecting Global Properties of Datasets with Distribution Privacy Mechanisms)

We consider the problem of ensuring confidentiality of dataset properties aggregated over many records of a dataset. Such properties can encode sensitive information, such as trade secrets or demographic data, while involving a notion of data protection different to the privacy of individual records typically discussed in the literature. In this work, we demonstrate how a distribution privacy framework can be applied to formalize such data confidentiality. We extend the Wasserstein Mechanism from Pufferfish privacy and the Gaussian Mechanism from attribute privacy to this framework, then analyze their underlying data assumptions and how they can be relaxed. We then empirically evaluate the privacy-utility tradeoffs of these mechanisms and apply them against a practical property inference attack which targets global properties of datasets. The results show that our mechanisms can indeed reduce the effectiveness of the attack while providing utility substantially greater than a crude group differential privacy baseline. Our work thus provides groundwork for theoretical mechanisms for protecting global properties of datasets along with their evaluation in practice.

翻译：我们考虑确保聚合在数据集中的许多记录上的数据集属性的机密性。这样的属性可以编码敏感信息，例如商业机密或人口统计数据，同时涉及一个不同于通常在文献中讨论的个人记录的隐私保护概念。在本文中，我们演示了如何应用分布隐私框架来形式化这种数据机密性。我们扩展了（英文专有名词）Pufferfish隐私的Wasserstein机制和属性隐私的高斯机制到这个框架，然后分析了它们的基础数据假设以及如何放松它们。然后，我们经验性地评估这些机制的隐私-效用权衡，并将它们应用于攻击全局数据集属性的实际属性推断攻击。结果表明，我们的机制确实可以降低攻击的有效性，同时提供比简单群体差异隐私基线大得多的效用。因此，我们的工作为理论机制以及实践中对保护数据集全局属性进行评估提供了基础。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

分布外泛化(Out-Of-Distribution Generalization) 综述论文，22页pdf240篇文献

专知会员服务

64+阅读 · 2021年9月2日