保护具有分配隐私权机制的全球数据集属性 (Protecting Global Properties of Datasets with Distribution Privacy Mechanisms)

Alongside the rapid development of data collection and analysis techniques in recent years, there is increasingly an emphasis on the need to address information leakage associated with such usage of data. To this end, much work in the privacy literature is devoted to the protection of individual users and contributors of data. However, some situations instead require a different notion of data confidentiality involving global properties aggregated over the records of a dataset. Such notions of information protection are particularly applicable for business and organization data, where global properties may reflect trade secrets, or demographic data, which can be harmful if mishandled. Recent work on property inference attacks furthermore shows how data analysis algorithms can be susceptible to leaking these global properties of data, highlighting the importance of developing mechanisms that can protect such information. In this work, we demonstrate how a distribution privacy framework can be applied to formalize the problem of protecting global properties of datasets. Given this framework, we investigate several mechanisms and their tradeoffs for providing this notion of data confidentiality. We analyze the theoretical protection guarantees offered by these mechanisms under various data assumptions, then implement and empirically evaluate these mechanisms for several data analysis tasks. The results of our experiments show that our mechanisms can indeed reduce the effectiveness of practical property inference attacks while providing utility substantially greater than a crude group differential privacy baseline. Our work thus provides groundwork for theoretically supported mechanisms for protecting global properties of datasets.

翻译：近年来,随着数据收集和分析技术的迅速发展,人们日益强调需要处理与数据使用相关的信息泄漏问题,为此,隐私文献中的许多工作都致力于保护个人用户和数据提供者,然而,有些情况却要求采用不同的数据保密概念,涉及由数据集记录汇总的全球性质;这类信息保护概念特别适用于商业和组织数据,其中全球性质可能反映贸易秘密或人口数据,如果处理不当,这种数据可能有害;最近关于财产推断攻击的工作还表明,数据分析算法如何可能泄露这些全球数据特性,强调建立能够保护这类信息的机制的重要性;在这项工作中,我们证明如何应用一个分配隐私框架,将保护全球数据集特性的问题正式化;鉴于这一框架,我们调查提供这种数据保密概念的若干机制及其取舍。我们根据各种数据假设分析了这些机制提供的理论保护保证,然后实施并用经验评价了这些机制来完成若干数据分析任务。我们的实验结果显示,我们的机制可以大大降低我们为保护这类信息提供这类信息的机制的重要性,与此同时,我们为保护全球基本财产的效用提供了一种基础。