The concept of differential privacy has widely penetrated academia and industry, with its formal guarantee on individual privacy that leads to compliances with privacy legislation, e.g., GDPR. However, there is a lack of understanding on tools capable of achieving differential privacy, and it is not clear what to expect from existing differential privacy tools when implementing privacy protection. Such an obstacle limits private applications' further prosperity. This paper reviews and evaluates the state-of-the-art open-source differential privacy tools of different domains using various estimating categories and privacy settings. Particularly, we look into the performances of three differential privacy tools for machine learning, two for statistical query, and four for synthetic data generation. We test all the tools on both continuous and categorical data and quantify their performance under different privacy budget and data size w.r.t. utility loss and system overhead. The accumulated evaluation results reveal several patterns that users can follow to optimally configure the tools, and provide preliminary guidelines on tool selection under different criteria. Finally, we openly release our evaluation coding repository, a framework that users can reuse to further evaluate the studied tools and beyond. We anticipate this work to provide a comprehensive insight into the performances of the existing dominant privacy tools, and a concrete reference for a potentially large developer community on private applications, thus narrowing the gap between conceptual differential privacy and private functionality development.
翻译:不同的隐私概念已广泛渗透到学术界和行业,其个人隐私正式保障导致遵守隐私立法,例如GDPR。然而,对于能够实现差异隐私的工具缺乏了解,在实行隐私保护时尚不清楚从现有的差异隐私工具中期待什么。这种障碍限制了私人应用的进一步繁荣。本文件审查和评估了不同领域的最新开放源码差异隐私工具,使用了不同的估计类别和隐私环境。特别是,我们研究了三种不同的机器学习隐私工具的性能,两个用于统计查询,四个用于合成数据生成。我们对所有工具进行连续和绝对数据测试,并在不同的隐私预算和数据大小(w.r.t. 公用事业损失和系统间接费用)下量化其性能。累积的评价结果揭示了用户可以遵循的几种模式,以优化工具配置,并为不同标准下的工具选择提供初步准则。最后,我们公开发布我们的评估编码储存库,一个用户可以再利用来进一步评估所研究的工具和以后的工具的框架。我们预计,这项工作将提供一个关于现有主要隐私权工具的绩效的大型洞察力,从而缩小现有私人隐私概念工具之间的潜在差距。