实践中的匿名:普遍化和禁止如何影响机械学习分类 ($k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers)

The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, \eg\ in the case of collaborative research endeavours. For use with anonymisation techniques, the $k$-anonymity criterion is one of the most popular, with numerous scientific publications on different algorithms and metrics. Anonymisation techniques often require changing the data and thus necessarily affect the results of machine learning models trained on the underlying data. In this work, we conduct a systematic comparison and detailed investigation into the effects of different $k$-anonymisation algorithms on the results of machine learning models. We investigate a set of popular $k$-anonymisation algorithms with different classifiers and evaluate them on different real-world datasets. Our systematic evaluation shows that with an increasingly strong $k$-anonymity constraint, the classification performance generally degrades, but to varying degrees and strongly depending on the dataset and anonymisation method. Furthermore, Mondrian can be considered as the method with the most appealing properties for subsequent classification.

翻译：保护私人信息是数据驱动的研究和商业环境中的一个关键问题。通常,为了允许数据共享,采用匿名或(选择性)删除等技术,在合作研究工作中采用这种技术。关于匿名技术,美元匿名标准是最受欢迎的标准之一,有许多关于不同算法和度量的科学出版物。匿名技术往往需要改变数据,从而必然影响根据基本数据培训的机器学习模型的结果。在这项工作中,我们系统比较和详细调查不同美元匿名算法对机器学习模型结果的影响。我们调查一套与不同分类者通用的美元匿名算法,并在不同的真实世界数据集中对其进行评估。我们的系统评估表明,由于美元匿名限制日益强烈,分类性能一般会降低,但程度不同,而且在很大程度上取决于数据集和地名化方法。此外,Mondrian可以被视为最有吸引力的随后分类方法。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

专知会员服务

39+阅读 · 2020年11月3日