联邦学习中的隐私保护:从GDPR的角度看问题 (Privacy Preservation in Federated Learning: Insights from the GDPR Perspective)

Along with the blooming of AI and Machine Learning-based applications and services, data privacy and security have become a critical challenge. Conventionally, data is collected and aggregated in a data centre on which machine learning models are trained. This centralised approach has induced severe privacy risks to personal data leakage, misuse, and abuse. Furthermore, in the era of the Internet of Things and big data in which data is essentially distributed, transferring a vast amount of data to a data centre for processing seems to be a cumbersome solution. This is not only because of the difficulties in transferring and sharing data across data sources but also the challenges on complying with rigorous data protection regulations and complicated administrative procedures such as the EU General Data Protection Regulation (GDPR). In this respect, Federated learning (FL) emerges as a prospective solution that facilitates distributed collaborative learning without disclosing original training data whilst naturally complying with the GDPR. Recent research has demonstrated that retaining data and computation on-device in FL is not sufficient enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system still conceal sensitive information, which can be exploited in some privacy attacks. Therefore, FL systems shall be empowered by efficient privacy-preserving techniques to comply with the GDPR. This article is dedicated to surveying on the state-of-the-art privacy-preserving techniques which can be employed in FL in a systematic fashion, as well as how these techniques mitigate data security and privacy risks. Furthermore, we provide insights into the challenges along with prospective approaches following the GDPR regulatory guidelines that an FL system shall implement to comply with the GDPR.

翻译：随着AI和机械学习应用和服务的兴起,数据隐私和安全已成为一项重大挑战。《公约》规定,数据被收集和汇总在一个数据中心,而该数据中心是机器学习模式的培训对象。这种集中化的做法给个人数据泄漏、滥用和滥用带来了严重的隐私风险。此外,在信息基本分布的物联网和大数据的时代,将大量数据转移到数据中心处理似乎不够麻烦。这不仅是因为数据来源之间在传输和共享数据方面存在困难,而且因为遵守严格的数据保护条例和复杂的行政程序,如欧盟通用数据保护条例(GDPR)也存在挑战。在这方面,联邦学习(FL)作为一种潜在解决方案,有利于在传播合作学习的同时不披露原始培训数据,同时自然遵守GDPR。最近的研究表明,保留远地域数据并计算设备对于隐私权来说是不够的。这是因为,在远地L系统中,缔约方之间交换的ML模型参数仍然隐藏敏感信息,而这种信息可以在某些隐私攻击中加以利用。因此,FL学习将方便传播合作性学习(FL)系统在保密性安全技术方面采用,因此,系统系统系统将使得我们能够使用安全的保密技术,从而按照我们使用。