Federated learning (FL) has emerged as a promising privacy-aware paradigm that allows multiple clients to jointly train a model without sharing their private data. Recently, many studies have shown that FL is vulnerable to membership inference attacks (MIAs) that can distinguish the training members of the given model from the non-members. However, existing MIAs ignore the source of a training member, i.e., the information of which client owns the training member, while it is essential to explore source privacy in FL beyond membership privacy of examples from all clients. The leakage of source information can lead to severe privacy issues. For example, identification of the hospital contributing to the training of an FL model for COVID-19 pandemic can render the owner of a data record from this hospital more prone to discrimination if the hospital is in a high risk region. In this paper, we propose a new inference attack called source inference attack (SIA), which can derive an optimal estimation of the source of a training member. Specifically, we innovatively adopt the Bayesian perspective to demonstrate that an honest-but-curious server can launch an SIA to steal non-trivial source information of the training members without violating the FL protocol. The server leverages the prediction loss of local models on the training members to achieve the attack effectively and non-intrusively. We conduct extensive experiments on one synthetic and five real datasets to evaluate the key factors in an SIA, and the results show the efficacy of the proposed source inference attack.
翻译:联邦学习(FL)已成为一个充满希望的隐私意识模式,使多个客户能够联合培训一个模型而不必分享其私人数据。最近,许多研究显示,FL很容易受到会籍推论攻击(MIAs)的伤害,这种攻击可以将特定模式的培训成员与非成员区分开来。然而,现有的MIA忽视培训成员的来源,即客户拥有培训成员的信息,而除了所有客户提供的例子的隐私隐私之外,还必须探索FL的来源隐私。源信息泄露可能导致严重的隐私问题。例如,确定有助于培训FLCOVID-19大流行病模型的医院可以使该医院的数据记录所有人更容易受到歧视,如果该医院处于高风险地区。在本文中,我们提出了一个新的推论攻击,即客户拥有培训成员的信息来源的合成推论攻击(SIA),这可以对培训成员的来源作出最佳估计。具体地说,我们采用Bayesian的观点,以证明诚实但可靠的服务器可以启动一个用于培训系统攻击的系统,从而在不威胁性服务器上有效评估系统攻击的系统攻击的系统数据库。我们建议,对五种主要的系统进行不威胁的系统分析,从而能够有效地评估对系统进行系统分析。