Sketches are widely used for frequency estimation of data with a large domain. However, sketches-based frequency estimation faces more challenges when considering privacy. Local differential privacy (LDP) is a solution to frequency estimation on sensitive data while preserving the privacy. LDP enables each user to perturb its data on the client-side to protect the privacy, but it also introduces errors to the frequency estimations. The hash collisions in the sketches make the estimations for low-frequent items even worse. In this paper, we propose a two-phase frequency estimation framework for data with a large domain based on an LDP learned sketch, which separates the high-frequent and low-frequent items to avoid the errors caused by hash collisions. We theoretically proved that the proposed method satisfies LDP and it is more accurate than the state-of-the-art frequency estimation methods including Apple-CMS, Apple-HCMS and FLH. The experimental results verify the performance of our method.
翻译:以草图为基础的频率估计在考虑隐私时面临更多的挑战。当地差异隐私(LDP)是在保护隐私的同时对敏感数据进行频率估计的一种解决办法。LDP使每个用户都能在客户方面干扰其数据以保护隐私,但它也给频率估计带来错误。草图中的散列碰撞使得低频项目的估计更加糟糕。在本文中,我们根据LDP所学的草图,为大域数据提出了一个两阶段频率估计框架,将高频和低频项目分开,以避免因Hash碰撞造成的错误。我们理论上证明,拟议的方法符合LDP,而且比最新的频率估计方法(包括苹果-CMS、苹果-HCMS和FLH)更准确。实验结果验证了我们方法的性能。