使不同隐私的平行构成达到最大程度 (Making the Most of Parallel Composition in Differential Privacy)

from arxiv, This is the full version of the paper with the same title to appear in the proceedings on the 22nd Privacy Enhancing Technologies Symposium (PETS 2022)

We show that the `optimal' use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that `overlap' on the data domain, a quantity we call the \emph{maximum overlap} of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of $f$-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for $f$-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to $10^{20000}$), and that its application can reduce the noise added to query answers by up to 60\%.

翻译：我们显示,平行构成定理的“ 最佳” 使用平行组成定理, 与在数据域上“ 重叠” 的最大一组查询的大小相符, 我们称之为查询的最大重叠。以前已经显示, 在确定查询的敏感性方面, 这个问题的某个实例是NP- 硬的, 但是, 也可以使用图形- 理论算法, 如找到最大分级, 以近似查询敏感度。在本文中, 我们考虑对上述实例的显著概括化, 包括范围更广的差异性私人机制以及范围更广的查询类别。我们显示, 对于特定类别的前端查询, 确定它们是否不相干, 在属性数量上, 我们显示最大重叠问题作为查询数量的函数。然而, 我们显示, 最高效的精确度方法, 与一个由纯度私人机制组成的直线性对某个图表的重重重重重叠。我们显示, 我们的直位值对数值的直径比, 我们的直径直径直值算法的直径直径直径直径直径直径直径直径直径直径直径直的算, 。