In recent years, it has been claimed that releasing accurate statistical information on a database is likely to allow its complete reconstruction. Differential privacy has been suggested as the appropriate methodology to prevent these attacks. These claims have recently been taken very seriously by the U.S. Census Bureau and led them to adopt differential privacy for releasing U.S. Census data. This in turn has caused consternation among users of the Census data due to the lack of accuracy of the protected outputs. It has also brought legal action against the U.S. Department of Commerce. In this paper, we trace the origins of the claim that releasing information on a database automatically makes it vulnerable to being exposed by reconstruction attacks and we show that this claim is, in fact, incorrect. We also show that reconstruction can be averted by properly using traditional statistical disclosure control (SDC) techniques. We further show that the geographic level at which exact counts are released is even more relevant to protection than the actual SDC method employed. Finally, we caution against confusing reconstruction and reidentification: using the quality of reconstruction as a metric of reidentification results in exaggerated reidentification risk figures.
翻译:近年来,人们声称,在数据库中公布准确的统计资料有可能使其得以彻底重建,人们建议以不同的隐私作为防止这些攻击的适当方法,美国人口普查局最近非常认真地对待这些声称,并导致他们采用不同的隐私来公布美国人口普查数据,这反过来又由于受保护产出缺乏准确性而使用户对普查数据感到惊恐,它还对美国商务部提起了法律诉讼。在本文中,我们追查了在数据库中公布资料会自动使其易受重建攻击影响的说法的来源,我们表明这一说法事实上是不正确的。我们还表明,通过适当使用传统的统计披露控制(SDC)技术可以避免重建。我们进一步表明,公布确切数字的地理级别比实际采用的SDC方法更适合于保护。最后,我们告诫不要混淆重建和重新确定:利用重建质量作为重新确定结果的衡量标准来夸大重新确定风险数字。