We propose a Bayesian pseudo posterior mechanism to generate record-level synthetic databases equipped with an $(\epsilon,\delta)-$ probabilistic differential privacy (pDP) guarantee, where $\delta$ denotes the probability that any observed database exceeds $\epsilon$. The pseudo posterior mechanism employs a data record-indexed, risk-based weight vector with weight values $\in [0, 1]$ that surgically downweight the likelihood contributions for high-risk records for model estimation and the generation of record-level synthetic data for public release. The pseudo posterior synthesizer constructs a weight for each data record using the Lipschitz bound for that record under a log-pseudo likelihood utility function that generalizes the exponential mechanism (EM) used to construct a formally private data generating mechanism. By selecting weights to remove likelihood contributions with non-finite log-likelihood values, we guarantee a finite local privacy guarantee for our pseudo posterior mechanism at every sample size. Our results may be applied to \emph{any} synthesizing model envisioned by the data disseminator in a computationally tractable way that only involves estimation of a pseudo posterior distribution for parameters, $\theta$, unlike recent approaches that use naturally-bounded utility functions implemented through the EM. We specify mild conditions that guarantee the asymptotic contraction of $\delta$ to $0$ over the space of databases. We illustrate our pseudo posterior mechanism on the sensitive family income variable from the Consumer Expenditure Surveys database published by the U.S. Bureau of Labor Statistics. We show that utility is better preserved in the synthetic data for our pseudo posterior mechanism as compared to the EM, both estimated using the same non-private synthesizer, due to our use of targeted downweighting.
翻译:我们提议建立贝叶色假伪后天体机制, 以生成记录级合成数据库, 并配有美元( epsilon,\delta)- 美元( obbiodical) 的概率差异隐私保障( pDP) 。 美元( delta) 表示任何观察到的数据库都超过$( epsilon) 的概率。 假后天体机制使用一个重值( $, 0. 1) 的数据记录指数指数指数指数化的基于风险的重量矢量矢量机制, 以外科方式降低高风险记录对模型估算和生成记录级通用合成数据的可能性。 假后天体合成数据记录仪( $) 将每个数据记录值的数值值值值值值值调整为美元( 美元( 美元) 。 我们的汇率指数级( ) 生成的结果可能适用于每个数据记录级数( 美元( ) ) 变量( 美元( 美元( 美元) 美元) ) 的数值( 美元( 美元( 美元) 美元) ( 美元( 美元) 美元( 美元) 美元( 美元) ( 美元) 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( ) ( 美元) ( ) ( ) ( ) ( ) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( ) ( ) ( ) ( 美元) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) (