部分私自数据的正式隐私 (Formal Privacy for Partially Private Data)

Differential privacy (DP) requires that any statistic based on confidential data be released with additional noise for privacy. Such a restriction can be logistically impossible to achieve, for example due to policy-mandated disclosure in the present or unsanitized data releases in the past. Still, we want to preserve DP-style privacy guarantees for future data releases in excess of this pre-existing public information. In this paper, we present a privacy formalism, $\epsilon$-DP relative to $Z$, extending Pufferfish privacy, that accommodates DP-style semantics in the presence of public information. We introduce two mechanisms for releasing partially private data (PPD) and prove their desirable properties such as asymptotic negligibility of errors due to privacy and congeniality with as-is public information. We demonstrate theoretically and empirically how statistical inference from PPD degrades with post-processing, and propose alternative inference algorithms for estimating statistics from PPD. This collection of the framework, mechanisms, and inferential tools aims to help practitioners overcome the real logistical barriers introduced when public information is an unavoidable component of the data release process.

翻译：不同隐私(DP)要求根据机密数据发布任何统计数据,同时增加隐私噪音。这种限制在后勤上可能无法实现,例如,由于政策授权披露目前或过去不卫生的数据发布情况,因此在目前或过去不卫生的数据发布情况中,这种限制在逻辑上可能无法实现。然而,我们仍希望维护DP式的隐私保障,为未来数据发布提供超过先前存在的公共信息的DP式隐私保障。在本文中,我们呈现一种隐私形式主义,即美元相对Z美元,扩大普费鱼类隐私,在公共信息面前顾及DP式的语义。我们引入了两种机制,以释放部分私人数据(PPD),并证明其可取性,例如隐私和等与作为公共信息的共通性导致的错误的隐含性。我们从理论上和实践中展示了PD与后处理的退化的统计推论,并提出了估算PPD统计数据的替代推论算法。这种框架、机制和推论工具的收集旨在帮助从业人员克服当公共信息是数据发布过程中不可避免的组成部分时引入的实际后勤障碍。