Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems, such as federated learning (FL). Increasing batch size to complicate data recovery is often viewed as a promising defense strategy against data leakage. In this paper, we revisit this defense premise and propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients. We name our proposed method as catastrophic data leakage in vertical federated learning (CAFE). Comparing to existing data leakage attacks, our extensive experimental results on vertical FL settings demonstrate the effectiveness of CAFE to perform large-batch data leakage attack with improved data recovery quality. We also propose a practical countermeasure to mitigate CAFE. Our results suggest that private data participated in standard FL, especially the vertical case, have a high risk of being leaked from the training gradients. Our analysis implies unprecedented and practical data leakage risks in those learning settings. The code of our work is available at https://github.com/DeRafael/CAFE.
翻译:最近的研究显示,私营培训数据可以通过分布式机器学习系统(如联合学习系统)部署的梯度共享机制泄露,例如联合学习系统(FL)。 增加批量规模使数据恢复复杂化往往被视为防止数据泄漏的有希望的防御战略。在本文件中,我们重新审视这一防御前提,并提出先进的数据泄漏攻击,从理论上说明理由,以便从共享的汇总梯度中有效回收批量数据。我们将我们提出的方法称为纵向联合学习中的灾难性数据泄漏。与现有数据泄漏攻击相比,我们在垂直FL设置上的广泛实验结果表明,CAFE以更好的数据回收质量对大批量数据泄漏进行攻击是有效的。我们还提出了减少CAFE的切实对策。我们的结果表明,参与标准FL的私人数据,特别是垂直案例,极有可能从培训梯度中泄漏。我们的分析表明,这些学习环境中的空前和实际数据泄漏风险。我们的工作守则可在https://github.com/DeRafael/CAFEFE查阅。