In 2018, the US Census Bureau designed a new data reconstruction and re-identification attack and tested it against their 2010 data release. The specific attack executed by the Bureau allows an attacker to infer the race and ethnicity of respondents with average 75% precision for 85% of the respondents, assuming that the attacker knows the correct age, sex, and address of the respondents. They interpreted the attack as exceeding the Bureau's privacy standards, and so introduced stronger privacy protections for the 2020 Census in the form of the TopDown Algorithm (TDA). This paper demonstrates that race and ethnicity can be inferred from the TDA-protected census data with substantially better precision and recall, using less prior knowledge: only the respondents' address. Race and ethnicity can be inferred with average 75% precision for 98% of the respondents, and can be inferred with 100% precision for 11% of the respondents. The inference is done by simply assuming that the race/ethnicity of the respondent is that of the majority race/ethnicity for the respondent's census block. The conclusion to draw from this simple demonstration is NOT that the Bureau's data releases lack adequate privacy protections. Indeed it is the purpose of the data releases to allow this kind of inference. The problem, rather, is that the Bureau's criteria for measuring privacy is flawed and overly pessimistic.
翻译:2018年,美国人口普查局设计了一个新的数据重建和再识别攻击,并根据2010年的数据发布情况进行了测试。该局实施的具体攻击允许攻击者对85%的应答者的种族和族裔作出平均75%精确度的推断,假设攻击者知道正确的年龄、性别和地址。他们将攻击行为解释为超出了该局的隐私标准,因此对2020年人口普查采用了以最高水平高官(TDA)为形式的更强有力的隐私保护。本文表明,种族和族裔可以从受TDA保护的人口普查数据中以更精确得多的精确度和回顾方式推断出来,使用不那么早的知识:只有应答者的地址为平均75%精确度。可以对98%的应答者进行平均75%精确度的推断出种族和族裔,并且对11%的应答者进行100%精确度的推断。只是假设应答者的种族/族裔是被调查者的多数种族/族裔。从这一简单演示得出的结论是,局的数据发布没有准确性,因此,测量隐私的标准是过分的。事实上,测量局的数据发布标准是过分的准确性。