Given a query result of a big database, why-provenance can be used to calculate the necessary part of this database, consisting of so-called witnesses. If this database consists of personal data, privacy protection has to prevent the publication of these witnesses. This implies a natural conflict of interest between publishing original data (provenance) and protecting these data (privacy). In this paper, privacy goes beyond the concept of personal data protection. The paper gives an extended definition of privacy as intellectual property protection. If the provenance information is not sufficient to reconstruct a query result, additional data such as witnesses or provenance polynomials have to be published to guarantee traceability. Nevertheless, publishing this provenance information might be a problem if (significantly) more tuples than necessary can be derived from the original database. At this point, it is already possible to violate privacy policies, provided that quasi identifiers are included in this provenance information. With this poster, we point out fundamental problems and discuss first proposals for solutions.
翻译:根据一个大数据库的查询结果,为什么证明可以用来计算这个数据库的必要部分,由所谓的证人组成。如果这个数据库包括个人数据,隐私保护必须防止这些证人的公布。这意味着公布原始数据(证明)与保护这些数据(隐私)之间自然的利益冲突。在本文中,隐私超出了个人数据保护的概念。文件将隐私的定义扩大为知识产权保护。如果出处信息不足以重建查询结果,则必须公布其他数据,例如证人或出处多名类数据,以保证可追踪性。然而,如果(大大)从原始数据库中得出比必要多的图例,公布这种出处信息可能是一个问题。在这一点上,已经有可能违反隐私政策,但前提是将准识别符列入该来源信息。有了这一海报,我们指出一些基本问题,并讨论关于解决办法的初步建议。