With the advent of big data and the birth of the data markets that sell personal information, individuals' privacy is of utmost importance. The classical response is anonymization, i.e., sanitizing the information that can directly or indirectly allow users' re-identification. The most popular solution in the literature is the k-anonymity. However, it is hard to achieve k-anonymity on a continuous stream of data, as well as when the number of dimensions becomes high.In this paper, we propose a novel anonymization property called z-anonymity. Differently from k-anonymity, it can be achieved with zero-delay on data streams and it is well suited for high dimensional data. The idea at the base of z-anonymity is to release an attribute (an atomic information) about a user only if at least z - 1 other users have presented the same attribute in a past time window. z-anonymity is weaker than k-anonymity since it does not work on the combinations of attributes, but treats them individually. In this paper, we present a probabilistic framework to map the z-anonymity into the k-anonymity property. Our results show that a proper choice of the z-anonymity parameters allows the data curator to likely obtain a k-anonymized dataset, with a precisely measurable probability. We also evaluate a real use case, in which we consider the website visits of a population of users and show that z-anonymity can work in practice for obtaining the k-anonymity too.
翻译:随着大数据的到来和数据市场的诞生,出售个人信息、个人隐私是最重要的。古典反应是匿名化,即对可以直接或间接让用户重新识别的信息进行清洁处理,文献中最受欢迎的解决办法是k-匿名。然而,在连续的数据流上,以及当维度高时,很难实现k-匿名。在本文中,我们提议了一种叫z-匿名的新匿名属性。与k-匿名性不同,它可以用数据流上的零delay访问来实现,它非常适合高维度数据。在 z- 匿名性的基础是发布一个用户的属性(原子信息),只要至少是z-1的其他用户在以往的时间窗口中提供了相同的属性。 z- 匿名性比 k- 匿名性更弱,因为它不能够对属性组合进行工作,但也可以单独地对待它们。 确切地说,我们用一个精确的本地参数来显示一个真实的数据框架。 我们用一个精确的数学模型来显示一个真实的准确性数据,我们用一个精确性的数据框架来显示一个精确性的数据。