采样模拟的有限样本隐私数据推断 (Simulation-based, Finite-sample Inference for Privatized Data)

Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often results in complex and intractable sampling distributions. In this paper, we propose to use the simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests based on privatized statistics. We show that this methodology is applicable to a wide variety of private inference problems, appropriately accounts for biases introduced by privacy mechanisms (such as by clamping), and improves over other state-of-the-art inference methods such as the parametric bootstrap in terms of the coverage and type I error of the private inference. We also develop significant improvements and extensions for the repro sample methodology for general models (not necessarily related to privacy), including 1) modifying the procedure to ensure guaranteed coverage and type I errors, even accounting for Monte Carlo error, and 2) proposing efficient numerical algorithms to implement the confidence intervals and $p$-values.

翻译：隐私保护方法，例如差分隐私机制，会将噪声引入到结果统计数据中，导致复杂和难以处理的采样分布。在本文中，我们提出使用基于模拟重复取样的方法，对隐私统计数据进行统计有效的置信区间和假设检验。我们表明，这种方法适用于各种隐私推断问题，适当考虑隐私机制引入的偏差（例如通过夹紧实现），并且在统计隐私推断的覆盖率和类型I误差方面改进了其他最先进的推断方法，例如参数自助法。我们还针对通用的模型（不一定与隐私有关）开展了重大改进和扩展，包括1）修改程序，以确保保证覆盖率和类型I误差，即使考虑蒙特卡洛误差；2）提出高效的数值算法来实现置信区间和 $p$ 值。