When working with user data providing well-defined privacy guarantees is paramount. In this work we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. Hence we need to opt for other privacy notions such as $k$-anonymity are better at preserving utility in this context. In this work we present a variation of $k$-anonymity, which we call smooth $k$-anonymity and design simple algorithms that efficiently provide smooth $k$-anonymity. We further perform an empirical evaluation to back our theoretical guarantees, and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.
翻译:当与提供定义明确的隐私保障的用户数据合作时,我们的首要任务就是与第三方私下操作和共享整个稀有数据集。事实上,差异隐私已经成为隐私的黄金标准,然而,在共享稀有数据集方面,作为我们的主要结果之一,我们证明与初始数据集保持合理相似的不同私人机制注定会有一个非常薄弱的隐私保障。因此,我们需要选择其他隐私概念,如美元匿名等。因此,我们需要选择其他隐私概念,如美元匿名在维护这一背景下的实用性方面更好。在这项工作中,我们提出了美元匿名的变异,我们称之为“平滑美元匿名”和设计简单算法,高效地提供平滑的美元匿名。我们进一步进行了经验评估,以支持我们的理论保障,并表明我们的算法改进了下游机器对匿名数据的学习任务的业绩。