In order to remain competitive, Internet companies collect and analyse user data for the purpose of improving user experiences. Frequency estimation is a widely used statistical tool which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods based on differential privacy have been proposed, which either require a large user base or a trusted server; hence may give big companies an unfair advantage while handicapping smaller organizations in their growth opportunity. To address this issue, this paper proposes a fair privacy-preserving sampling-based frequency estimation method and provides a relation between its privacy guarantee, output accuracy, and number of participants. We designed decentralized privacy-preserving aggregation mechanisms using multi-party computation technique and established that, for a limited number of participants and a fixed privacy level, our mechanisms perform better than those that are based on traditional perturbation methods; hence, provide smaller companies a fair growth opportunity. We further propose an architectural model to support weighted aggregation in order to achieve higher accuracy estimate to cater for users with different privacy requirements. Compared to the unweighted aggregation, our method provides a more accurate estimate. Extensive experiments are conducted to show the effectiveness of the proposed methods.
翻译:为了保持竞争力,互联网公司收集和分析用户数据,以便提高用户经验; 频率估计是一个广泛使用的统计工具,可能与有关隐私条例发生冲突; 提出了基于不同隐私的隐私保护分析方法,这需要很大的用户基础或信任的服务器; 因而可能给大公司以不公平的优势,同时阻碍较小的组织的增长机会; 为解决这一问题,本文件提出一个公平的隐私保护抽样频率估计方法,并提供了隐私保障、产出准确性和参与者人数之间的关系; 我们利用多方计算技术设计了分散的隐私保护汇总机制,并确定了对少数参与者和固定隐私水平而言,我们的机制比传统扰动方法要好; 因此,为较小的公司提供一个公平的增长机会; 我们还提出一个支持加权汇总的建筑模型,以便为不同隐私要求的用户提供更准确的估计; 与未加权的汇总相比,我们的方法提供了更准确的估计。 进行了广泛的实验,以显示拟议方法的有效性。