Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex - but very common - machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data for general compact metric spaces. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmicaly slowly.
翻译:差异隐私是一个数学概念,它提供了信息理论安全保障。虽然差异隐私已成为数据共享中保障隐私的一个事实上的标准,但已知的实现这一标准的已知机制存在一些严重限制。通常只对固定的、先验的一组询问提供效用保障。此外,对于更复杂但非常常见的机器学习任务,例如集群或分类,没有实用的保障。在本文中,我们克服了其中的一些限制。在使用度量隐私(即差异隐私的有力概括化)的同时,我们开发了一种多元时间算法,从数据集中创建了一种私人措施。这种私人措施使我们能够高效地构建对广泛的统计分析工具准确的私人合成数据。此外,我们证明私人措施和合成数据对于一般紧凑空间的合成数据是一种无端的微量结果。我们构建中的一个关键要素是一种新的超常规的随机行走走法,其步骤的联合分布与独立随机变量的分布一样经常,但与原始的日志相对缓慢。