社会化数据的基于仿真的有限样本推断模拟 (Simulation-based, Finite-sample Inference for Privatized Data)

Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often results in complex and intractable sampling distributions. In this paper, we propose to use the simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests based on privatized statistics. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.

翻译：针对保护隐私的方法，例如差分隐私机制，会导致统计量中引入噪声，这通常会导致复杂和难以处理的抽样分布。在本文中，我们提出了使用基于仿真的“仿真样本”方法，以便产生基于私有化统计的统计有效的置信区间和假设检验。我们展示了这种方法适用于各种各样的私有推断问题，适当考虑隐私机制引入的偏差（例如通过夹紧引起的），并且在覆盖范围和私有推断的类型 I 错误方面优于其他最新的推论方法，例如参数化引导。我们还针对一般模型（不一定与隐私有关）开发了重要的改进和扩展，包括：1）修改程序以确保保证覆盖范围和类型 I 错误，即使考虑了蒙特卡洛误差；和 2）提出有效的数值算法来实施置信区间和 $p$ 值。