Minimizing privacy leakage while ensuring data utility is a critical problem to data holders in a privacy-preserving data publishing task. Most prior research concerns only with one type of data and resorts to a single obscuring method, \eg, obfuscation or generalization, to achieve a privacy-utility tradeoff, which is inadequate for protecting real-life heterogeneous data and is hard to defend ever-growing machine learning based inference attacks. This work takes a pilot study on privacy-preserving data publishing when both generalization and obfuscation operations are employed for heterogeneous data protection. To this end, we first propose novel measures for privacy and utility quantification and formulate the hybrid privacy-preserving data obscuring problem to account for the joint effect of generalization and obfuscation. We then design a novel hybrid protection mechanism called HyObscure, to cross-iteratively optimize the generalization and obfuscation operations for maximum privacy protection under a certain utility guarantee. The convergence of the iterative process and the privacy leakage bound of HyObscure are also provided in theory. Extensive experiments demonstrate that HyObscure significantly outperforms a variety of state-of-the-art baseline methods when facing various inference attacks under different scenarios. HyObscure also scales linearly to the data size and behaves robustly with varying key parameters.
翻译:尽量减少隐私泄露,同时确保数据使用性,是数据持有者在隐私保存数据公布任务中面临的一个关键问题。大多数先前的研究仅涉及一种类型的数据,并采用一种单一隐蔽方法,例如,模糊或笼统的方法,实现隐私使用权交换,这不足以保护真实生活差异数据,难以维护不断增长的基于推理的机器学习攻击。这项工作需要就隐私保存数据出版问题进行试点研究,在采用通用和模糊两种方法保护不同数据时,我们首先提出隐私和效用量化的新措施,并拟订混合隐私保存数据模糊问题,以说明通用和模糊的共同影响。然后我们设计一个新的混合保护机制,称为HyObscure, 以交叉优化通用和模糊操作,以便在特定效用保证下最大限度地保护隐私。迭代程序和HyObsre的隐私渗漏捆绑在理论中也提供了新的措施,同时在理论中也提出了理论中,在总体隐私保存数据的模糊性参数和精确度下,在各种关键攻击情景下,大规模地进行系统实验,在不同的基准情景下,也以不同的方式进行。