The exponential growth of collected, processed, and shared data has given rise to concerns about individuals' privacy. Consequently, various laws and regulations have been established to oversee how organizations handle and safeguard data. One such method is Statistical Disclosure Control, which aims to minimize the risk of exposing confidential information by de-identifying it. This de-identification is achieved through specific privacy-preserving techniques. However, a trade-off exists: de-identified data can often lead to a loss of information, which might impact the accuracy of data analysis and the predictive capability of models. The overarching goal remains to safeguard individual privacy while preserving the data's interpretability, meaning its overall usefulness. Despite advances in Statistical Disclosure Control, the field continues to evolve, with no definitive solution that strikes an optimal balance between privacy and utility. This survey delves into the intricate processes of de-identification. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance. Herein, we tackle the primary challenges posed by privacy constraints, overview predominant strategies to mitigate these challenges, categorize privacy-preserving techniques, offer a theoretical assessment of current comparative research, and highlight numerous unresolved issues in the domain.
翻译:暂无翻译