A reconstruction attack on a private dataset $D$ takes as input some publicly accessible information about the dataset and produces a list of candidate elements of $D$. We introduce a new class of data reconstruction attacks based on randomized methods for non-convex optimization. We empirically demonstrate that our attacks can not only reconstruct full rows of $D$ from aggregate query statistics $Q(D)\in \mathbb{R}^m$, but can do so in a way that reliably ranks reconstructed rows by their odds of appearing in the private data, providing a signature that could be used for prioritizing reconstructed rows for further actions such as identify theft or hate crime. We also design a sequence of baselines for evaluating reconstruction attacks. Our attacks significantly outperform those that are based only on access to a public distribution or population from which the private dataset $D$ was sampled, demonstrating that they are exploiting information in the aggregate statistics $Q(D)$, and not simply the overall structure of the distribution. In other words, the queries $Q(D)$ are permitting reconstruction of elements of this dataset, not the distribution from which $D$ was drawn. These findings are established both on 2010 U.S. decennial Census data and queries and Census-derived American Community Survey datasets. Taken together, our methods and experiments illustrate the risks in releasing numerically precise aggregate statistics of a large dataset, and provide further motivation for the careful application of provably private techniques such as differential privacy.
翻译:对私人数据集的重建攻击 以美元作为输入一些关于该数据集的可公开获取的信息,并编制一份候选人要素清单,以美元为美元。我们推出一个新的数据重建攻击类别,其依据是非colvex优化的随机方法。我们从经验上表明,我们的攻击不仅能够从总查询统计数据中重建整排的$D$ $Q(D)\ in mathbb{R ⁇ m$,而且能够以可靠的方式将重建的行排序,因为它们在私人数据中出现差异的可能性,提供了可以用来为进一步行动确定重整行的优先次序的签名,例如查明盗窃或仇恨犯罪。我们还设计了一套评估重建袭击的基线序列。我们的攻击大大超过那些仅仅利用公共分布或人口来重建整排的一排美元,而私人数据集来自这些人口,表明它们正在利用总统计数据中的数据 $(D) 美元,而不仅仅是分布的总体结构。换句话说,“D”的查询是为了在2010年的人口普查中重新重建这一数据,而不是根据大量数据定义的统计方法进行。