This paper proposes an $\alpha$-lift measure for data privacy and determines the optimal privatization scheme that minimizes the $\alpha$-lift in the watchdog method. To release data $X$ that is correlated with sensitive information $S$, the ratio $l(s,x) = \frac{p(s|x)}{p(s)} $ denotes the `lift' of the posterior belief on $S$ and quantifies data privacy. The $\alpha$-lift is proposed as the $L_\alpha$-norm of the lift: $\ell_{\alpha}(x) = \| (\cdot,x) \|_{\alpha} = (E[l(S,x)^\alpha])^{1/\alpha}$. This is a tunable measure: When $\alpha < \infty$, each lift is weighted by its likelihood of appearing in the dataset (w.r.t. the marginal probability $p(s)$); For $\alpha = \infty$, $\alpha$-lift reduces to the existing maximum lift. To generate the sanitized data $Y$, we adopt the privacy watchdog method using $\alpha$-lift: Obtain $\mathcal{X}_{\epsilon}$ containing all $x$'s such that $\ell_{\alpha}(x) > e^{\epsilon}$; Apply the randomization $r(y|x)$ to all $x \in \mathcal{X}_{\epsilon}$, while all other $x \in \mathcal{X} \setminus \mathcal{X}_{\epsilon}$ are published directly. For the resulting $\alpha$-lift $\ell_{\alpha}(y)$, it is shown that the Sibson mutual information $I_{\alpha}^{S}(S;Y)$ is proportional to $E[ \ell_{\alpha}(y)]$. We further define a stronger measure $\bar{I}_{\alpha}^{S}(S;Y)$ using the worst-case $\alpha$-lift: $\max_{y} \ell_{\alpha}(y)$. We prove that the optimal randomization $r^*(y|x)$ that minimizes both $I_{\alpha}^{S}(S;Y)$ and $\bar{I}_{\alpha}^{S}(S;Y)$ is $X$-invariant, i.e., $r^*(y|x) = R(y), \forall x\in \mathcal{X}_{\epsilon}$ for any probability distribution $R$ over $y \in \mathcal{X}_{\epsilon}$. Numerical experiments show that $\alpha$-lift can provide flexibility in the privacy-utility tradeoff.
翻译:本文为数据保密性提出一个$+1美元的标准, 并确定最佳的私有化方案, 以最大值美元的方式, 最小值美元。 要发布与敏感信息相关的数据 $S$, 比例 $l( s,x) =\\ p( s)x) =\ p( s)\x( p)\\\\ \ p)\ p} 美元, 表示以美元表示后置信念的“ 提升 ” 美元, 并量化数据隐私。 $+2, 提高值, 提高值美元, 增加值美元 美元; 使用最高值( c) 美元, 使用最高值 美元(x) 美元(x) 平面数据 显示: 当 美元= * * * * 美元, 进一步加权值每升值, 其出现在数据中的可能性( w.r. t) 边值 美元; 美元= (c) 美元=x 美元, 使用最高值 美元 美元(x) 美元, 降低现有数据 。