Private collection of statistics from a large distributed population is an important problem, and has led to large scale deployments from several leading technology companies. The dominant approach requires each user to randomly perturb their input, leading to guarantees in the local differential privacy model. In this paper, we place the various approaches that have been suggested into a common framework, and perform an extensive series of experiments to understand the tradeoffs between different implementation choices. Our conclusion is that for the core problems of frequency estimation and heavy hitter identification, careful choice of algorithms can lead to very effective solutions that scale to millions of users
翻译:从大量分散的人群中收集私人统计数据是一个重要问题,并导致几个主要技术公司大规模部署。 占主导地位的方法要求每个用户随机干扰他们的投入,从而在本地差异隐私模式中提供保障。 在本文中,我们将建议的各种方法置于一个共同的框架之中,并进行一系列广泛的实验,以了解不同执行选择之间的权衡。 我们的结论是,对于频率估计和重击器识别等核心问题,谨慎选择算法可以导致非常有效的解决办法,对数百万用户来说,是规模很大的。