Recent attacks have shown that user data can be recovered from FedSGD updates, thus breaking privacy. However, these attacks are of limited practical relevance as federated learning typically uses the FedAvg algorithm. Compared to FedSGD, recovering data from FedAvg updates is much harder as: (i) the updates are computed at unobserved intermediate network weights, (ii) a large number of batches are used, and (iii) labels and network weights vary simultaneously across client steps. In this work, we propose a new optimization-based attack which successfully attacks FedAvg by addressing the above challenges. First, we solve the optimization problem using automatic differentiation that forces a simulation of the client's update that generates the unobserved parameters for the recovered labels and inputs to match the received client update. Second, we address the large number of batches by relating images from different epochs with a permutation invariant prior. Third, we recover the labels by estimating the parameters of existing FedSGD attacks at every FedAvg step. On the popular FEMNIST dataset, we demonstrate that on average we successfully recover >45% of the client's images from realistic FedAvg updates computed on 10 local epochs of 10 batches each with 5 images, compared to only <10% using the baseline. Our findings show many real-world federated learning implementations based on FedAvg are vulnerable.
翻译:最近的攻击表明,用户数据可以从FedSGD更新中恢复,从而打破隐私。然而,这些攻击由于联邦化学习通常使用FedAvg算法,其实际相关性有限。与FedSGD相比,从FedAvg更新中恢复数据要困难得多,因为:(一) 更新是在未观测到的中间网络重量基础上计算的,(二) 使用了大量批量,以及(三) 用户步骤之间的标签和网络重量同时不同。在这项工作中,我们提议一种新的基于优化的攻击,通过应对上述挑战成功袭击FedAvg。首先,我们用自动区分来解决优化问题,迫使对客户更新进行模拟,为回收的标签和输入生成未观测到的参数,以与收到的客户更新相匹配。第二,我们通过使用不同选区的图像以及之前的变异性来应对大量批次。第三,我们通过估计FDSGDSD每一步现有的攻击的参数来恢复标签。首先,我们用FEMNIST数据库的通用数据设置来模拟客户更新更新数据,然后用平均 < fedalal_45A 显示我们10的当地图像的最新版本。