Many data applications have certain invariant constraints due to practical needs. Data curators who employ differential privacy need to respect such constraints on the sanitized data product as a primary utility requirement. Invariants challenge the formulation, implementation, and interpretation of privacy guarantees. We propose subspace differential privacy, to honestly characterize the dependence of the sanitized output on confidential aspects of the data. We discuss two design frameworks that convert well-known differentially private mechanisms, such as the Gaussian and the Laplace mechanisms, to subspace differentially private ones that respect the invariants specified by the curator. For linear queries, we discuss the design of near-optimal mechanisms that minimize the mean squared error. Subspace differentially private mechanisms rid the need for post-processing due to invariants, preserve transparency and statistical intelligibility of the output, and can be suitable for distributed implementation. We showcase the proposed mechanisms on the 2020 Census Disclosure Avoidance demonstration data, and a spatio-temporal dataset of mobile access point connections on a large university campus.
翻译:由于实际需要,许多数据应用存在某些差异性的限制。使用不同隐私的数据收集员必须尊重对清洁数据产品的限制,将其作为一项首要的公用事业要求。各种变量对隐私保障的制定、实施和解释提出了挑战。我们提议了分空间的隐私,以诚实地描述净化输出对数据保密方面的依赖性。我们讨论了两个设计框架,将众所周知的不同私营机制,如高山和拉皮尔机制,转换为尊重馆长指定变量的子空间差异性私营机制。关于线性询问,我们讨论了如何设计近最佳机制,以尽量减少平均平方错误。次空间的私营机制,排除了因差异性、保持透明度和统计敏感性而产生的处理后需要,并适合分散实施。我们展示了拟议的2020年人口普查披露避免显示数据的机制,以及大型大学校园移动接入点连接的时空数据集。