在地方差异隐私下收集多层面数据的风险 (On the Risks of Collecting Multidimensional Data Under Local Differential Privacy)

The private collection of multiple statistics from a population is a fundamental statistical problem. One possible approach to realize this is to rely on the local model of differential privacy (LDP). Numerous LDP protocols have been developed for the task of frequency estimation of single and multiple attributes. These studies mainly focused on improving the utility of the algorithms to ensure the server performs the estimations accurately. In this paper, we investigate privacy threats (re-identification and attribute inference attacks) against LDP protocols for multidimensional data following two state-of-the-art solutions for frequency estimation of multiple attributes. To broaden the scope of our study, we have also experimentally assessed five widely used LDP protocols, namely, generalized randomized response, optimal local hashing, subset selection, RAPPOR and optimal unary encoding. Finally, we also proposed a countermeasure that improves both utility and robustness against the identified threats. Our contributions can help practitioners aiming to collect users' statistics privately to decide which LDP mechanism best fits their needs.

翻译：从人口中私人收集多种统计数据是一个根本性的统计问题。实现这一点的一个可能办法是依靠当地差异隐私模式(LDP) 。已经制定了许多LDP协议,以完成对单一属性和多重属性的频率估算任务。这些研究主要侧重于改进算法的效用,以确保服务器准确进行估算。在本文件中,我们调查了根据两种最先进的方法对多种属性频率估算进行频率估算的多维数据对LDP协议的隐私威胁(重新识别和归属推断攻击 ) 。为了扩大我们的研究范围,我们还试验性地评估了五种广泛使用的LDP协议,即普遍随机反应、最佳本地散列、子选择、RAPPOR和最佳单词编码。最后,我们还提议了一种反措施,既能提高效用,又能增强对已确定的威胁的稳健性。我们的贡献可以帮助从业人员私下收集用户的统计数据,以便决定哪些LDP机制最适合他们的需求。