A large amount of high-dimensional and heterogeneous data appear in practical applications, which are often published to third parties for data analysis, recommendations, targeted advertising, and reliable predictions. However, publishing these data may disclose personal sensitive information, resulting in an increasing concern on privacy violations. Privacy-preserving data publishing has received considerable attention in recent years. Unfortunately, the differentially private publication of high dimensional data remains a challenging problem. In this paper, we propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases: a Markov-blanket-based attribute clustering phase and an invariant post randomization (PRAM) phase. Specifically, splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable allocation of privacy budget, while a double-perturbation mechanism satisfying local differential privacy facilitates an invariant PRAM to ensure no loss of statistical information and thus significantly preserves data utility. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy. We conduct extensive experiments on four real-world datasets and the experimental results demonstrate that our mechanism can significantly improve the data utility of the published data while satisfying differential privacy.
翻译:大量高维和多元数据出现在实际应用中,这些应用往往向第三方公布数据分析、建议、有针对性的广告和可靠的预测;然而,公布这些数据可能披露个人敏感信息,导致对侵犯隐私行为日益关注;近年来,隐私保护数据出版受到相当重视;不幸的是,高维数据的不同私下出版仍然是一个具有挑战性的问题;在本文件中,我们提议建立一个有区别的私人高维数据出版机制(DP2-Pubb),分两个阶段运行:以Markov为基点的属性聚合阶段和无变式后随机化(PRAM)阶段。具体地说,将属性分割成几个低维群,集群内凝聚程度高和集群间混合程度低,有助于获得对隐私预算的合理分配,而满足本地差异隐私的双重扰动性机制则便利了无变式的PRAM,以确保统计信息不丢失,从而大大保护数据效用。我们还将我们的DP2-Pub机制扩展为设想情景,配有满足当地差异隐私的半声波服务器。我们广泛试验了四个真实世界数据机制,同时进行广泛的实用性数据交换。