The exponential growth of collected, processed, and shared microdata has given rise to concerns about individuals' privacy. As a result, laws and regulations have emerged to control what organisations do with microdata and how they protect it. Statistical Disclosure Control seeks to reduce the risk of confidential information disclosure by de-identifying them. Such de-identification is guaranteed through privacy-preserving techniques. However, de-identified data usually results in loss of information, with a possible impact on data analysis precision and model predictive performance. The main goal is to protect the individuals' privacy while maintaining the interpretability of the data, i.e. its usefulness. Statistical Disclosure Control is an area that is expanding and needs to be explored since there is still no solution that guarantees optimal privacy and utility. This survey focuses on all steps of the de-identification process. We present existing privacy-preserving techniques used in microdata de-identification, privacy measures suitable for several disclosure types and, information loss and predictive performance measures. In this survey, we discuss the main challenges raised by privacy constraints, describe the main approaches to handle these obstacles, review taxonomies of privacy-preserving techniques, provide a theoretical analysis of existing comparative studies, and raise multiple open issues.
翻译:收集、处理和共享的微观数据的急剧增长引起了人们对个人隐私的关切,因此,出现了法律和条例,以控制各组织对微观数据及其保护方式的做法。统计披露控制力求通过取消身份来降低机密信息披露的风险。这种去身份识别通过隐私保护技术得到保障。但是,去身份识别数据通常导致信息丢失,可能对数据分析的准确性和模型预测性性能产生影响。主要目标是保护个人的隐私,同时保持数据的可解释性,即它的有用性。统计披露控制是一个正在扩大的领域,需要探索,因为还没有保障最佳隐私和效用的任何解决办法。本次调查侧重于取消身份识别过程的所有步骤。我们介绍了在微观数据去身份识别、适合几种披露类型的隐私措施、信息损失和预测性绩效措施方面使用的现有隐私保留技术。在这次调查中,我们讨论了隐私限制带来的主要挑战,描述了处理这些障碍的主要方法,审查了隐私保存技术的分类,对现有的比较研究提出了理论分析,提出了多种问题。