The problem of private data disclosure is studied from an information theoretic perspective. Considering a pair of dependent random variables $(X,Y)$, where $X,Y$ denote the private and useful data, respectively, the following problem is addressed: What is the maximum information that can be revealed about $Y$ (measured by mutual information $I(Y;U)$, in which $U$ is the revealed data), while disclosing no information about $X$ (captured by the condition of statistical independence, i.e., $X\independent U$, and henceforth called \textit{perfect privacy})? We analyze the supremization of $I(Y;U)$ under perfect privacy for two scenarios: \textit{output perturbation} and \textit{full data observation}, which correspond to the cases where the revealed data is the output of a kernel (called \textit{privacy-preserving mapping}) applied to $Y$ and $(X,Y)$, respectively. In the case of finite alphabets, the linear algebraic analysis involved in the solution provides some interesting results, such as upper/lower bounds on the size of the released alphabet and the maximum utility. In this setting, we propose a privacy-preserving algorithm which is far less complex than the optimal solution, and yet provides acceptable performance. When the private data is binary, it is proved that the proposed algorithm achieves the optimal solution, which has a closed form expression in the full data observation model. Afterwards, it is shown that for jointly Gaussian $(X,Y)$, perfect privacy is not possible in the output perturbation model in contrast to the full data observation model. Finally, an asymptotic analysis is provided in the context of output perturbation model, to obtain the rate of released information when a sufficiently small leakage is allowed. It is shown that this rate is always finite when perfect privacy is not feasible; otherwise, under mild conditions, this becomes unbounded.
翻译:私自数据披露问题是从信息理论角度研究的。 考虑到一对依赖性的随机变量$(X,Y)$(X,Y美元), 其中美元代表私有和有用数据, 我们解决了以下问题: 在两个假设情景下, 最多可以披露的关于Y美元的信息是什么( 由共同信息衡量 $I(Y,U,美元是披露数据) 美元, 其中美元是美元, 而没有披露关于X美元的信息( 由统计独立状态( 即 $X), 并此后称为\textit{ perfect pressy} )? 我们分析了美元( Y,Y;U) 美元( 美元) 代表私人数据披露 $( 美元) (xx) 独立的随机随机随机随机变量 。 在两个假设情景下, 美元( 美元) 美元( textrimodeal) 和 美元( perfectorality) 下, 以最精确的汇率分析方式显示一个最精确的数值 。