The de-anonymization of users from anonymized microdata through matching or aligning with publicly-available correlated databases has been of scientific interest recently. While most of the rigorous analyses of database matching have focused on random-distortion models, the adversarial-distortion models have been wanting in the relevant literature. In this work, motivated by synchronization errors in the sampling of time-indexed microdata, matching (alignment) of random databases under adversarial column deletions is investigated. It is assumed that a constrained adversary, which observes the anonymized database, can delete up to a $\delta$ fraction of the columns (attributes) to hinder matching and preserve privacy. Column histograms of the two databases are utilized as permutation-invariant features to detect the column deletion pattern chosen by the adversary. The detection of the column deletion pattern is then followed by an exact row (user) matching scheme. The worst-case analysis of this two-phase scheme yields a sufficient condition for the successful matching of the two databases, under the near-perfect recovery condition. A more detailed investigation of the error probability leads to a tight necessary condition on the database growth rate, and in turn, to a single-letter characterization of the adversarial matching capacity. This adversarial matching capacity is shown to be significantly lower than the \say{random} matching capacity, where the column deletions occur randomly. Overall, our results analytically demonstrate the privacy-wise advantages of adversarial mechanisms over random ones during the publication of anonymized time-indexed data.
翻译:在这项工作中,由于时间指数微观数据抽样的同步错误,在对抗性删除栏目下随机数据库的匹配(对齐),调查了随机数据库的匹配(对齐)情况,假定一个观察匿名数据库的受限对手可以删除该栏目(属性)中最多不超过1 $\delta美元的部分,以阻碍匹配和维护隐私。两个数据库的直径图是作为透视性反差特征使用的,以探测对手选择的列删除模式。在这项工作中,由于时间指数缩略距抽样抽样抽样抽样中的同步错误,匹配(对齐)随机数据库的匹配(对齐),因此,可以假定一个受限的对手,在接近超速恢复状态下,可以删除该栏目(属性)中多达1美元的部分,以阻碍匹配隐私。两个数据库的直径直径直线直线图被作为透视性反差特征,在一次反差分析能力上显示的直径直线删除模式,在一次反差分析能力上显示比一次直径直径数据库的直径匹配能力。