在安全聚合保证下,联邦学习多少隐私? (How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?)

Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates. While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage with the number of training rounds.

翻译：联邦学习(FL)已引起越来越多的关注,让多用户能够对储存的数据进行隐私保存机器学习,同时避免将数据从设计上移动。然而,虽然数据从未离开用户设备,但由于用户培训数据的重大计算是以经过培训的本地模型形式共享的,隐私仍然得不到保障。这些本地模型最近被显示通过不同隐私攻击,如模型反向攻击等,对隐私构成重大威胁。作为补救措施,安全聚合(SA)已经发展成为保护FL隐私的框架,保证服务器只能学习全球综合模型更新,而不是个人模型更新。虽然SA确保除了综合模型更新之外,个人模型更新的信息不会泄露,但隐私仍然得不到保障,因为对用户培训数据的重要计算以经过培训的保密性FL提供多少隐私信息。在服务器上,我们首次对FL的隐私号码进行了分析,我们使用“Meal Interal Informal Information” 做了量化指标,并且从每个用户关于个人模型更新的信息都有多少。当我们参与的IMAL数据更新时,我们使用FSG的智能数据流数据将逐渐更新,我们使用FL的存储数据将减少。