Differential privacy is a restriction on data processing algorithms that provides strong confidentiality guarantees for individual records in the data. However, research on proper statistical inference, that is, research on properly quantifying the uncertainty of the (noisy) sample estimate regarding the true value in the population, is currently still limited. This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median. Instead of computing a differentially private point estimate and deriving its uncertainty, we directly estimate the interval bounds and discuss why this approach is superior if ensuring privacy is important. We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--simultaneously should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion. We evaluate the performance of the different algorithms under various parameter settings in extensive simulation studies and demonstrate how the findings could be applied in practical settings using data from the 1940 Decennial Census.
翻译:隐私差异是对数据处理算法的一种限制,这种算法为数据中的个人记录提供了有力的保密保障。然而,关于适当统计推断的研究,即适当量化人口真实价值(noisy)抽样估计的不确定性的研究,目前仍然有限。本文件提出并评价了计算中位值的有效私人不同信任间隔的若干战略。我们不是计算差异私人点估计,而是得出其不确定性,而是直接估计间隔界限,并讨论如果确保隐私很重要,为什么这一方法更优越。我们还说明,解决不确定性的来源――抽样出错和保护产出的误差,而不是同时保护产出的误差,应该优于以顺序方式纳入不确定性的简单方法。我们在广泛的模拟研究中评估各种参数环境中的不同算法的性能,并展示如何利用1940年十二月人口普查的数据在实际环境中应用这些结果。