Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification, and it is an important alternative to differential privacy in applications that cannot tolerate noisy data. Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data. Reasons for limiting this access range from undesirability to complete infeasibility. This paper explores ideas -- objectives, metrics, protocols, and extensions -- for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity. We suggest trust (amount of information provided to the curator) and privacy (anonymity of the participants) as the primary objectives of such a framework. We describe a class of protocols aimed at achieving these goals, proposing new metrics of privacy in the process, and proving related bounds. We conclude by discussing a natural extension of this work that completely removes the need for a central curator.
翻译:数据匿名化是一种旨在防止参与者重新识别的隐私保护数据发布方法,也是在不能容忍吵闹数据的应用中区别对待隐私的一个重要替代办法。在公布的数据中,执行美元匿名化的现有算法假定,进行匿名化的管理员完全可以查阅原始数据。限制这种访问的原因从不可取到完全不可行不等。本文件探讨了减少对馆长的信任,同时仍然保持一个美元匿名的统计概念的各种想法 -- -- 目标、指标、协议和扩展 -- -- 以降低对馆长的信任。我们建议信任(向馆长提供的信息数量)和隐私(参与者的匿名)作为这一框架的主要目标。我们描述了旨在实现这些目标的一系列协议,提出程序隐私的新衡量标准,并证明相关的界限。我们最后通过讨论这项工作的自然扩展,从而完全消除了对中央馆长的需求。