The local privacy mechanisms, such as k-RR, RAPPOR, and the geo-indistinguishability ones, have become quite popular thanks to the fact that the obfuscation can be effectuated at the users end, thus avoiding the need of a trusted third party. Another important advantage is that each data point is sanitized independently from the others, and therefore different users may use different levels of obfuscation depending on their privacy requirements, or they may even use entirely different mechanisms depending on the services they are trading their data for. A challenging requirement in this setting is to construct the original distribution on the users sensitive data from their noisy versions. Existing techniques can only estimate that distribution separately on each obfuscation schema and corresponding noisy data subset. But the smaller are the subsets, the more imprecise the estimations are. In this paper we study how to avoid the subsets-fractioning problem when combining local privacy mechanisms, thus recovering an optimal utility. We focus on the estimation of the original distribution, and on the two main methods to estimate it: the matrix-inversion method and the iterative Bayes update. We consider various cases of combination of local privacy mechanisms, and compare the flexibility and the performance of the two methods.
翻译:当地隐私机制,如K-RR、RAPPOR和地理分解机制,由于用户端可以产生混淆,从而避免信任第三方的需要,因此变得相当受欢迎。另一个重要的好处是,每个数据点的清洁程度与其他数据点分开,因此不同的用户可能根据他们的隐私要求使用不同程度的模糊问题,或者根据他们交换数据所需的服务,甚至可能使用完全不同的机制。这一环境的一个挑战性要求是建立用户从他们吵闹的版本中敏感数据的原始分发方式。现有的技术只能估计每个模糊的模型和相应的扰动数据组的单独分发。但每个数据组越小,估计就越不精确。在本文中,我们研究如何在结合当地隐私机制时避免子集问题,从而恢复最佳效用。我们侧重于对原始分布的估算,以及估计它的两个主要方法:矩阵转换方式和迭接的隐私更新。我们比较了两种业绩机制的组合。我们比较了两种方法的组合。