We consider the problem of discrete distribution estimation under locally differential privacy. Distribution estimation is one of the most fundamental estimation problems, which is widely studied in both non-private and private settings. In the local model, private mechanisms with provably optimal sample complexity are known. However, they are optimal only in the worst-case sense; their sample complexity is proportional to the size of the entire universe, which could be huge in practice (e.g., all IP addresses). We show that as long as the target distribution is sparse or approximately sparse (e.g., highly skewed), the number of samples needed could be significantly reduced. The sample complexity of our new mechanism is characterized by the sparsity of the target distribution and only weakly depends on the size the universe. Our mechanism does privatization and dimensionality reduction simultaneously, and the sample complexity will only depend on the reduced dimensionality. The original distribution is then recovered using tools from compressive sensing. To complement our theoretical results, we conduct experimental studies, the results of which clearly demonstrate the advantages of our method and confirm our theoretical findings.
翻译:我们考虑的是本地差异隐私下分散分布估计的问题。分配估计是最为根本的估计问题之一,在非私人和私人环境中都广泛研究这一问题。在本地模式中,已知的私人机制具有最优化的样本复杂度。然而,它们只在最坏的情况下才最理想;其抽样复杂性与整个宇宙的大小成正比,而实际上宇宙的规模可能很大(例如,所有IP地址)。我们表明,只要目标分布稀少或几乎稀少(例如,高度偏斜),所需的样本数量就可以大大减少。我们新机制的样本复杂性特征是目标分布的松散,而且仅微小地取决于宇宙的大小。我们的机制同时进行私有化和尺寸的减少,而抽样复杂性将只取决于缩小的维度。然后利用压缩测测工具恢复原始分布。为了补充我们的理论结果,我们进行实验研究,其结果清楚地表明我们方法的优点,并证实我们的理论结论。