Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in FL use-cases where the clients' training involves updating the Batch Normalization (BN) statistics and provide a new baseline attack that works for such scenarios. Furthermore, we present new ways to measure and visualize potential data leakage in FL. Our work is a step towards establishing reproducible methods of measuring data leakage in FL and could help determine the optimal tradeoffs between privacy-preserving techniques, such as differential privacy, and model accuracy based on quantifiable metrics. Code is available at https://nvidia.github.io/NVFlare/research/quantifying-data-leakage.
翻译:联邦学习(FL)允许在无需分享原始数据的情况下对AI模型进行合作培训,这种能力使得在病人和数据隐私最令人关切的保健应用方面特别有趣;然而,最近关于将深神经网络从模型梯度中倒转的工作引起了人们对FL在防止培训数据泄漏方面的安全问题的关切;在这项工作中,我们表明,文献中介绍的这些攻击在FL使用案例中是不切实际的,因为客户的培训涉及更新批量正常化(BN)统计数据,并提供一种新的基线攻击,为这些情况起作用;此外,我们提出了衡量和可视化FL潜在数据渗漏的新方法。我们的工作是朝着建立可减少的测量FL数据渗漏的方法迈出的一步,有助于确定隐私保护技术(例如差异隐私)和基于量化指标的模型准确性之间的最佳权衡。守则可在https://nvidia.gitub.io/NVFlare/research/quantification-data-leakage查阅。