普世均等:作为不同隐私权评价衡量标准加以复制</s> (Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy)

Lucas Rosenblatt,Bernease Herman,Anastasia Holovenko,Wonkwon Lee,Joshua Loftus,Elizabeth McKinnie Taras Rumezhak,Andrii Stadnik,Bill Howe,Julia Stoyanovich

from arxiv, Preprint. 14 pages

Differential privacy (DP) mechanisms are increasingly proposed to afford public release of sensitive information, offering strong theoretical guarantees for privacy, yet limited empirical evidence of utility. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. We instantiate our methodology over a benchmark of recent peer-reviewed papers that analyze public datasets in the ICPSR social science repository. We model quantitative claims computationally to automate the experimental workflow, and model qualitative claims by reproducing visualizations and comparing the results manually. We then generate DP synthetic datasets using multiple state-of-the-art mechanisms, and estimate the likelihood that these conclusions will hold. We find that, for reasonable privacy regimes, state-of-the-art DP synthesizers are able to achieve high epistemic parity for several papers in our benchmark. However, some papers, and particularly some specific findings, are difficult to reproduce for any of the synthesizers. Given these results, we advocate for a new class of mechanisms that can reorder the priorities for DP data synthesis: favor stronger guarantees for utility (as measured by epistemic parity) and offer privacy protection with a focus on application-specific threat models and risk-assessment.

翻译：越来越多地提出不同隐私(DP)机制,以公开发布敏感信息,为隐私提供强有力的理论保障,但有限的实用性经验证据。实用性通常被计量为代表性代理任务的错误,如描述性统计或调查工作量的绩效。这些结果概括实践者经验的能力在一些场合受到质疑,包括美国人口普查。在本文中,我们提出了一个合成数据的评价方法,避免假设代理任务的代表性,而不是衡量如果作者使用合成数据,即我们称之为认知性对等的条件,所公布的结论会改变的可能性。我们快速衡量我们的方法,将最近同行审查的论文作为基准,分析比较比较方案社会科学存放处的公共数据集。我们用模拟数量主张来计算实验工作流程的自动化和模型定性主张,通过重新生成视觉和人工比较结果。我们然后利用多种最新机制生成DP合成数据集,并估计这些结论将维持的具体风险。我们发现,对于合理的隐私文件,先进的通用评估方法,即DP综合评估文件的状态和综合分析结果能够实现高比例的精确性结果,而通过这些精确的缩略性文件的精确性结论,我们发现,我们的一些精确的缩缩缩缩缩缩缩缩缩的缩的缩缩缩缩的缩的缩缩缩缩图能够实现。</s>