We study the problem of histogram estimation under user-level differential privacy, where the goal is to preserve the privacy of all entries of any single user. While there is abundant literature on this classical problem under the item-level privacy setup where each user contributes only one data point, little has been known for the user-level counterpart. We consider the heterogeneous scenario where both the quantity and distribution of data can be different for each user. We propose an algorithm based on a clipping strategy that almost achieves a two-approximation with respect to the best clipping threshold in hindsight. This result holds without any distribution assumptions on the data. We also prove that the clipping bias can be significantly reduced when the counts are from non-i.i.d. Poisson distributions and show empirically that our debiasing method provides improvements even without such constraints. Experiments on both real and synthetic datasets verify our theoretical findings and demonstrate the effectiveness of our algorithms.
翻译:我们研究了用户一级差异隐私下的直方图估计问题,其目标是保护任何用户所有条目的隐私。虽然在项目一级隐私设置下,每个用户只提供一个数据点,但用户一级对应方却知之甚少。我们考虑了数据数量和分布都可能因用户而异的多种情况。我们根据剪切战略提出了一种算法,几乎在后视最佳剪切阈值方面达到两种一致。这一结果没有关于数据的任何分配假设。我们还证明,如果计数来自非i.i.i.d. Poisson分布,那么剪切偏偏偏偏偏偏偏偏偏偏偏对准的偏差就会大大缩小,并且从经验上表明,即使没有这种限制,我们的偏差方法也能提供改进。对真实和合成数据集的实验证实了我们的理论发现,并证明了我们的算法的有效性。