We propose new differential privacy solutions for when external \emph{invariants} and \emph{integer} constraints are simultaneously enforced on the data product. These requirements arise in real world applications of private data curation, including the public release of the 2020 U.S. Decennial Census. They pose a great challenge to the production of provably private data products with adequate statistical usability. We propose \emph{integer subspace differential privacy} to rigorously articulate the privacy guarantee when data products maintain both the invariants and integer characteristics, and demonstrate the composition and post-processing properties of our proposal. To address the challenge of sampling from a potentially highly restricted discrete space, we devise a pair of unbiased additive mechanisms, the generalized Laplace and the generalized Gaussian mechanisms, by solving the Diophantine equations as defined by the constraints. The proposed mechanisms have good accuracy, with errors exhibiting sub-exponential and sub-Gaussian tail probabilities respectively. To implement our proposal, we design an MCMC algorithm and supply empirical convergence assessment using estimated upper bounds on the total variation distance via $L$-lag coupling. We demonstrate the efficacy of our proposal with applications to a synthetic problem with intersecting invariants, a sensitive contingency table with known margins, and the 2010 Census county-level demonstration data with mandated fixed state population totals.
翻译:在数据产品同时执行外部差异和整数特性时,我们提出新的差异隐私解决方案,这些要求产生于实际应用的私人数据整理,包括公开发布2020年美国十二年人口普查,这些要求对生产具有适当统计可用性的可变私人数据产品构成巨大挑战。我们提议,在数据产品同时保持异差和整数特性时,严格阐明隐私保障,并展示我们提案的构成和后处理特性。为了应对来自可能高度限制的离散空间的取样挑战,我们设计了一套不带偏见的添加机制、通用拉普和通用高斯机制,解决了受限制定义的可变异方程式。拟议机制具有很高的准确性,并分别显示亚异差和亚百草枯亚色尾数的概率。为了落实我们的提案,我们设计了MC 算法并提供了实证趋同性整合评估,使用估计的上限,即2010年全上限、普遍拉普特和通用高标准机制,展示了我们所了解的敏感程度的2010年州间变异度数据,并展示了2010年州间变异度、高压度、高压度、高压度、高压度、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压、高压。