Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building block in data warehouses. Although many practical sketches for cardinality estimation exist, none provide privacy when merging. We propose the first practical cardinality sketches that are simultaneously mergeable, differentially private (DP), and have low empirical errors. These introduce a novel randomized algorithm for performing logical operations on noisy bits, a tight privacy analysis, and provably optimal estimation. Our sketches dramatically outperform existing theoretical solutions in simulations and on real-world data.
翻译:数据草图是进行区别计算的关键工具,使多套数据能够由接受快速基本估计的简明摘要来代表。由于草图可以被合并为对多组联盟的总结,因此它们是数据仓中的一个基本基石。虽然存在许多基本估计的实际草图,但在合并时没有隐私。我们提出了第一个同时可合并、有区别私人(DP)和有低经验错误的实用基本草图。这些草图引入了一种新颖的随机算法,用于对吵闹的比特进行逻辑操作,进行严格的隐私分析,以及可能的最佳估计。我们的草图大大超过模拟和现实世界数据的现有理论解决方案。