Data sharing between different parties has become increasingly common across industry and academia. An important class of privacy concerns that arises in data sharing scenarios regards the underlying distribution of data. For example, the total traffic volume of data from a networking company can reveal the scale of its business, which may be considered a trade secret. Unfortunately, existing privacy frameworks (e.g., differential privacy, anonymization) do not adequately address such concerns. In this paper, we propose summary statistic privacy, a framework for analyzing and protecting these summary statistic privacy concerns. We propose a class of quantization mechanisms that can be tailored to various data distributions and statistical secrets, and analyze their privacy-distortion trade-offs under our framework. We prove corresponding lower bounds on the privacy-utility tradeoff, which match the tradeoffs of the quantization mechanism under certain regimes, up to small constant factors. Finally, we demonstrate that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms on real-world datasets.
翻译:不同行业和学术界之间日益普遍地分享数据。在数据分享假设中出现的一个重要隐私问题涉及数据的基本分布。例如,联网公司的数据总流量可以揭示其业务规模,可被视为贸易秘密。不幸的是,现有的隐私框架(例如,不同的隐私、匿名化)不能充分解决这种关切。在本文件中,我们提出了简要统计数据隐私,这是分析和保护这些简要统计数据隐私关切的一个框架。我们建议了一类量化机制,可以针对各种数据分布和统计秘密进行量身定制,并分析我们框架下的隐私扭曲交易。我们证明,与某些制度下的四分制机制的权衡相对应的隐私效用交易范围相对较低,甚至存在一些不变的因素。最后,我们证明拟议的四分制机制比现实世界数据集的替代隐私机制更能实现更好的隐私扭曲交易。</s>