To quantify trade-offs between increasing demand for open data sharing and concerns about sensitive information disclosure, statistical data privacy (SDP) methodology analyzes data release mechanisms which sanitize outputs based on confidential data. Two dominant frameworks exist: statistical disclosure control (SDC), and more recent, differential privacy (DP). Despite framing differences, both SDC and DP share the same statistical problems at its core. For inference problems, we may either design optimal release mechanisms and associated estimators that satisfy bounds on disclosure risk, or we may adjust existing sanitized output to create new optimal estimators. Both problems rely on uncertainty quantification in evaluating risk and utility. In this review, we discuss the statistical foundations common to both SDC and DP, highlight major developments in SDP, and present exciting open research problems in private inference.
翻译:为了量化对公开数据共享需求的增加与敏感信息披露的关切之间的取舍,统计数据隐私(SDP)方法分析了根据机密数据净化产出的数据发布机制。存在两个主导框架:统计披露控制(SDC)和最近的差异隐私(DP)。尽管存在差异,但SDC和DP在核心统计问题上都存在同样的问题。关于推论问题,我们可以设计满足披露风险限制的最佳发布机制和相关估计者,或者调整现有净化产出以创造新的最佳估计者。在评估风险和效用时,这两个问题都依赖不确定性的量化。在这次审查中,我们讨论了SDC和DP的共同统计基础,突出SDP的主要发展动态,并在私下推论中提出令人振奋的公开研究问题。