Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of anonymity of a dataset through some of the most common anonymization techniques: k-anonymity, ($\alpha$,k)-anonymity, $\ell$-diversity, entropy $\ell$-diversity, recursive (c,$\ell$)-diversity, basic $\beta$-likeness, enhanced $\beta$-likeness, t-closeness and $\delta$-disclosure privacy. For the case of more than one sensitive attributes, two approaches are proposed for evaluating this techniques. The main strength of this library is to obtain a full report of the parameters that are fulfilled for each of the techniques mentioned above, with the unique requirement of the set of quasi-identifiers and that of sensitive attributes. We present the methods implemented together with the attacks they prevent, the description of the library, use examples of the different functions, as well as the impact and the possible applications that can be developed. Finally, some possible aspects to be incorporated in future updates are proposed.
翻译:公开分享具有敏感属性和隐私限制的数据是一项艰巨的任务。在本文件中,我们介绍了PyCANON、一个Python图书馆和命令线界面(CLI)的执行情况,以检查和评估通过一些最常用的匿名技术(k-匿名、(alpha$,k)-匿名、美元-美元-多样性、entropy $/ell$-多样性、重复(c,$/ell$)-多样性、基本美元-相似性、强化的美元-相似性、T-距离和美元-delta$-披露隐私)来检查和评估数据集的匿名程度。对于不止一种敏感的特性,建议了两种方法来评价这种技术。这个图书馆的主要长处是获得一份完整的报告,说明上述每种技术所达到的参数,以及一套准识别器的独特要求和敏感特性。我们介绍所实施的方法以及它们防止的攻击、图书馆的描述、使用不同功能的示例以及未来可能开发的各种应用。最后可以纳入的影响和可能的应用。