Recent attacks have shown that user data can be reconstructed from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. It is generally accepted that reconstructing data from FedAvg updates is much harder than FedSGD as: (i) there are unobserved intermediate weight updates, (ii) the order of inputs matters, and (iii) the order of labels changes every epoch. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update for the reconstructed labels and inputs so as to match the received client update. Second, we address the unknown input order by treating images at different epochs as independent during optimization, while relating them with a permutation invariant prior. Third, we reconstruct the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate that on average we successfully reconstruct >45% of the client's images from realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5 images, compared to only <10% using the baseline. These findings indicate that many real-world federated learning implementations based on FedAvg are vulnerable.
翻译:最近的袭击表明,用户数据可以通过FedSGD更新来重建,从而打破隐私。然而,这些袭击在实践上的相关性有限,因为联邦联盟学习通常使用FedAvg算法。人们普遍承认,从FedAvg更新中重建数据比FedAvg更新比FedSGD要困难得多,因为:(一) 中间重量更新没有观测到,(二) 输入事项的顺序,以及(三) 标签顺序会改变每一个时代。在这项工作中,我们提议一种新的基于优化的进攻,通过应对上述挑战成功袭击FedAvg。首先,我们使用自动区分方法解决优化问题,迫使客户模拟更新已重建标签和输入,以便匹配收到的客户更新。第二,我们处理未知的输入顺序是:(一) 将不同地方的图像视为在优化期间独立的中间重量更新,(二) 输入事项的顺序,以及(三) 标签顺序会改变每个时代的顺序。我们通过估计FedSGD袭击每个步骤的脆弱程度参数来成功攻击FDAvg。首先,我们使用FEMST数据集, 显示,我们通过平均对FDA-45的10级图像进行更新,我们只对10的当地图像进行了实际更新。