Local differential privacy is a promising privacy-preserving model for statistical aggregation of user data that prevents user privacy leakage from the data aggregator. This paper focuses on the problem of estimating the distribution of discrete user values with Local differential privacy. We review and present a comparative analysis on the performance of the existing discrete distribution estimation algorithms in terms of their accuracy on benchmark datasets. Our evaluation benchmarks include real-world and synthetic datasets of categorical individual values with the number of individuals from hundreds to millions and the domain size up to a few hundreds of values. The experimental results show that the Basic RAPPOR algorithm generally performs best for the benchmark datasets in the high privacy regime while the k-RR algorithm often gives the best estimation in the low privacy regime. In the medium privacy regime, the performance of the k-RR, the k-subset, and the HR algorithms are fairly competitive with each other and generally better than the performance of the Basic RAPPOR and the CMS algorithms.
翻译:本地差异隐私是用户数据统计汇总的一个很有希望的隐私保护模式,它防止了数据聚合者对用户隐私的泄露。本文件侧重于估算离散用户值与本地差异隐私的分布问题。我们从基准数据集的准确性角度,对现有离散分布估计算法的性能进行审查和比较分析。我们的评估基准包括个人绝对值真实世界和合成数据集,个人数从亿到百万,域数从几百万不等。实验结果显示基本 RAPPOR算法通常对高隐私系统中的基准数据集最有利,而K-RR算法通常对低隐私制度中的基准数据集提供最佳估计。在中等隐私制度中,K-RR、K子集和人力资源算法的性能相对而言具有竞争力,一般比基本RAPOR和CMS算法的性能要好。