Differential privacy has become a popular privacy-preserving method in data analysis, query processing, and machine learning, which adds noise to the query result to avoid leaking privacy. Sensitivity, or the maximum impact of deleting or inserting a tuple on query results, determines the amount of noise added. Computing the sensitivity of some simple queries such as counting query is easy, however, computing the sensitivity of complex queries containing join operations is challenging. Global sensitivity of such a query is unboundedly large, which corrupts the accuracy of the query answer. Elastic sensitivity and residual sensitivity offer upper bounds of local sensitivity to reduce the noise, but they suffer from either low accuracy or high computational overhead. We propose two fast query sensitivity estimation methods based on sampling and sketch respectively, offering competitive accuracy and higher efficiency compared to the state-of-the-art methods.
翻译:差分隐私已成为数据分析、查询处理和机器学习中流行的保护隐私的方法,它向查询结果添加噪声,以避免泄露隐私。敏感性(或者删除或插入元组对查询结果的最大影响)决定了添加的噪声量。计算简单查询(例如计数查询)的敏感性很容易,但计算包含联接操作的复杂查询的敏感性是具有挑战性的。这样一个查询的全局敏感性是无界的,从而破坏了查询答案的准确性。弹性敏感性和残余敏感性提供局部敏感度的上界,以减少噪声,但它们要么精度低,要么计算开销高。我们提出了两种基于采样和草图的快速查询敏感度估计方法,与现有最先进方法相比,这两种方法都提供了竞争良好的准确性和更高的效率。