With the rapidly increasing ability to collect and analyze personal data, data privacy becomes an emerging concern. In this work, we develop a new statistical notion of local privacy to protect each categorical data that will be collected by untrusted entities. The proposed solution, named subset privacy, privatizes the original data value by replacing it with a random subset containing that value. We develop methods for the estimation of distribution functions and independence testing from subset-private data with theoretical guarantees. We also study different mechanisms to realize the subset privacy and evaluation metrics to quantify the amount of privacy in practice. Experimental results on both simulated and real-world datasets demonstrate the encouraging performance of the developed concepts and methods.
翻译:随着收集和分析个人数据的能力迅速提高,数据隐私就成为一个新出现的关注问题。在这项工作中,我们开发了一个新的地方隐私统计概念,以保护将由不受信任的实体收集的每一项绝对数据。拟议的解决方案名为子集隐私,将原始数据价值私有化,代之以含有该价值的随机子集。我们开发了根据理论保证从子集私营数据估算分配功能和独立测试的方法。我们还研究了不同机制,以实现子集隐私和评价指标,以量化实践中的隐私数量。模拟和现实世界数据集的实验结果显示了发达概念和方法的令人鼓舞的表现。