使用概率数据结构在串流数据中使用概率数据结构进行本地化私自模糊计数 (Local Differentially Private Fuzzy Counting in Stream Data using Probabilistic Data Structure)

Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR and Apple's count-mean sketch (CMS) algorithm propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead.

翻译：对流数据中项目的计数进行保留隐私的估计,在几个现实世界情景中发现应用,包括自动校正和交通管理应用程序的字数。RAPPOR和苹果的计数平均草图(CMS)的最近作品提出了利用概率性数据结构(如计算Bloom过滤器和CMS)进行大量数据估算的隐私保护机制。然而,这些现有方法在为实时流数据应用提供合理解决方案方面不足。在这项工作中,我们提议了一个新颖的(地方)有区别的私人机制,为以类似或更低的隐私预算流数据计数问题提供高的效用,同时提供:(a) 报告相关或类似项目计数的模糊计数(例如记录打字错误和数据变异),以及(b) 提高查询效率,以减少实时查点计数的响应时间。我们提供了隐私和效用保障的正式证据,并用真实和合成英文数据集对我们的算法进行广泛的实验性评价,用于精确和模糊的计数情景。我们的隐私保护机制大大超过先前的测测算时间,或者在类似保密性估算的低空成本下(精确性计算)。