The re-identification or de-anonymization of users from anonymized data through matching with publicly-available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to anonymization. Recent research provides a fundamental understanding of the conditions under which privacy attacks, in the form of database matching, are successful in the presence of obfuscation. Motivated by synchronization errors stemming from the sampling of time-indexed databases, this paper presents a unified framework considering both obfuscation and synchronization errors and investigates the matching of databases under noisy entry repetitions. By investigating different structures for the repetition pattern, replica detection and seeded deletion detection algorithms are devised and sufficient and necessary conditions for successful matching are derived. Finally, the impacts of some variations of the underlying assumptions, such as adversarial deletion model, seedless database matching and zero-rate regime, on the results are discussed. Overall, our results provide insights into the privacy-preserving publication of anonymized and obfuscated time-indexed data as well as the closely-related problem of the capacity of synchronization channels.
翻译:通过与公开可得的相关用户数据相匹配,用户从匿名数据中重新识别或去匿名,引起了对隐私的关切,导致除了匿名化之外,还得出了补充的模糊度量,最近的研究使人们从根本上了解了在模糊化的情况下,以数据库匹配形式进行的隐私攻击取得成功的条件。本文受到时间索引数据库抽样产生的同步差错的驱动,提出了一个统一框架,既考虑到模糊和同步差错,又调查在噪音输入重复情况下数据库的匹配问题。通过调查重复模式的不同结构,设计了重复检测和种子删除检测算法,并提出了成功匹配的充足和必要条件。最后,讨论了一些基本假设的变化,如对抗性删除模式、无种子数据库匹配和零率制度,对结果的影响。总体而言,我们的结果为保留隐私出版匿名和模糊的时间索引数据提供了深刻见解,以及同步渠道能力方面密切相关的问题。