Synthetic data is a promising approach to privacy protection in many contexts. A Bayesian synthesis model, also known as a synthesizer, simulates synthetic values of sensitive variables from their posterior predictive distributions. The resulting synthetic data can then be released in place of the confidential data. An important evaluation prior to synthetic data release is its level of privacy protection, which is often in the form of disclosure risks evaluation. Attribute disclosure, referring to an intruder correctly inferring the confidential values of synthetic records, is one type of disclosure that is challenging to be computationally evaluated. In this paper, we review and discuss in detail some Bayesian estimation approaches to attribute disclosure risks evaluation, with examples of commonly-used Bayesian synthesizers. We create the $\texttt{AttributeRiskCalculation}$ R package to facilitate its implementation, and demonstrate its functionality with examples of evaluating attribute disclosure risks in synthetic samples of the Consumer Expenditure Surveys.
翻译:合成数据是在许多情况下保护隐私的一种很有希望的方法。一种贝叶西亚合成模型,又称合成器,模拟其后部预测分布中敏感变量的合成值。由此产生的合成数据可以取代机密数据。合成数据发布之前的一项重要评价是其隐私保护水平,通常采取披露风险评估的形式。属性披露是指入侵者正确推断合成记录的机密值,这是一种难以进行计算评估的披露类型。在本文中,我们详细审查和讨论一些巴伊西亚用于确定披露风险评估属性的估算方法,并举常见的贝叶西亚合成器为例。我们创建了美元(textt{AttritteRiskCalulting}R)R套件,以便利其实施,并展示其功能,举例说明对消费者支出调查合成样本中的属性披露风险进行评估。