通过合成微数据生成实现私人制表调查数据产品 (Private Tabular Survey Data Products through Synthetic Microdata Generation)

We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights $\in [0,1]$ based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic global probabilistic differential privacy guarantee. Our two approaches synthesize the observed sample distribution of the outcome and survey weights, jointly, such that both quantities together possess a privacy guarantee. The privacy-protected outcome and survey weights are used to construct tabular cell estimates (where the cell inclusion indicators are treated as known and public) and associated standard errors to correct for survey sampling bias. Through a real data application to the Survey of Doctorate Recipients public use file and simulation studies motivated by the application, we demonstrate that our two microdata synthesis approaches to construct tabular products provide superior utility preservation as compared to the additive-noise approach of the Laplace Mechanism. Moreover, our approaches allow the release of microdata to the public, enabling additional analyses at no extra privacy cost.

翻译：我们建议采用两种综合微观数据方法,以产生供公开发行的私人表格调查数据产品。我们建议采用两种合成微观数据方法。我们调整了一种假后代机制,即根据识别披露风险,降低按重量记录贡献的概率[0,1]美元,以生成用于调查数据的表格产品。我们用于观察的调查数据库的方法实现了一种无症状的全球概率差异隐私保障。我们的两个方法综合了观察到的结果和调查重量的样本分布,使所观察到的样本数量都具有隐私保障。隐私保护的结果和调查权重被用来编制表格细胞估计数(在细胞包含指标被视为已知和公开的情况下)和相关的标准错误,以纠正调查抽样偏差。我们通过对博士接收者调查使用公众档案和模拟研究的实际数据应用,证明我们两个用于构建表格产品的微观数据合成方法与拉比机制的添加营养法相比,提供了更好的效用保护。此外,我们的方法允许向公众发布微观数据,从而能够在不增加隐私成本的情况下进行额外分析。