Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $\epsilon$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is approximately $r\epsilon$, if a value of $\epsilon$ is used to protect the output from the sample. In this paper, we study whether this amplification effect can be exploited systematically to improve the accuracy of the privatized estimate. Specifically, assuming the agency has information for the full population, we ask under which circumstances accuracy gains could be expected, if the privatized estimate would be computed on a random sample instead of the full population. We find that accuracy gains can be achieved for certain regimes. However, gains can typically only be expected, if the sensitivity of the output with respect to small changes in the database does not depend too strongly on the size of the database. We only focus on algorithms that achieve differential privacy by adding noise to the final output and illustrate the accuracy implications for two commonly used statistics: the mean and the median. We see our research as a first step towards understanding the conditions required for accuracy gains in practice and we hope that these findings will stimulate further research broadening the scope of differential privacy algorithms and outputs considered.
翻译:最近对不同隐私的研究表明,(子)抽样可以提高保护水平。例如,对于美元和不同隐私,如果实际隐私保障大约为美元,如果使用美元来保护抽样产出的价值,则实际隐私保障大约为美元。在本文件中,我们研究是否可以系统地利用这一放大效应来提高私有化估计数的准确性。具体地说,假设机构拥有关于全部人口的信息,我们询问在什么情况下可以预期准确性提高,如果私有化估计数是以随机抽样而不是以全部人口来计算的话。我们发现某些制度可以实现准确性提高。然而,如果数据库小变化的产出的敏感性不过分取决于数据库的大小,通常只能预期收益。我们只注重通过在最后产出中增加噪音来实现不同隐私的算法,并表明两种常用统计数据的准确性影响:中值和中值。我们的研究将我们的研究视为第一步,要了解在扩大隐私研究中提高准确性成果所需的条件,我们希望这些结果将激励这些分析范围。