This paper considers the problem of publishing data $X$ while protecting correlated sensitive information $S$. We propose a linear method to generate the sanitized data $Y$ with the same alphabet $\mathcal{Y} = \mathcal{X}$ that attains local differential privacy (LDP) and log-lift at the same time. It is revealed that both LDP and log-lift are inversely proportional to the statistical distance between conditional probability $P_{Y|S}(x|s)$ and marginal probability $P_{Y}(x)$: the closer the two probabilities are, the more private $Y$ is. Specifying $P_{Y|S}(x|s)$ that linearly reduces this distance $|P_{Y|S}(x|s) - P_Y(x)| = (1-\alpha)|P_{X|S}(x|s) - P_X(x)|,\forall s,x$ for some $\alpha \in (0,1]$, we study the problem of how to generate $Y$ from the original data $S$ and $X$. The Markov randomization/sanitization scheme $P_{Y|X}(x|x') = P_{Y|S,X}(x|s,x')$ is obtained by solving linear equations. The optimal non-Markov sanitization, the transition probability $P_{Y|S,X}(x|s,x')$ that depends on $S$, can be determined by maximizing the data utility subject to linear equality constraints. We compute the solution for two linear utility function: the expected distance and total variance distance. It is shown that the non-Markov randomization significantly improves data utility and the marginal probability $P_X(x)$ remains the same after the linear sanitization method: $P_Y(x) = P_X(x), \forall x \in \mathcal{X}$.
翻译:本文思考了在保护相关敏感信息的同时公布数据$X美元的问题 。 我们提出一种线性方法来生成以相同字母$\mathcal{Y} =\mathcal{X} 美元同时实现本地差分隐私(LDP) 和日志提升的洗净数据美元。 显示LDP和日志提升与条件概率$P ⁇ Y ⁇ S}(x}) 和边际概率$P ⁇ Y}(x) 美元之间的统计距离成反比。 两种可能性越近, 越私的越远 美元 美元 。 指定 $+#Y} (x) 实现本地差差差价 $(LDP) - (x) =X} (x) 相同(x) - P_(x) 美元(x),\}(x}(x) rall s, 美元(x) 等价(美元) 。