Vertical federated learning (VFL) system has recently become prominent as a concept to process data distributed across many individual sources without the need to centralize it. Multiple participants collaboratively train models based on their local data in a privacy-aware manner. To date, VFL has become a de facto solution to securely learn a model among organizations, allowing knowledge to be shared without compromising privacy of any individuals. Despite the prosperous development of VFL systems, we find that certain inputs of a participant, named adversarial dominating inputs (ADIs), can dominate the joint inference towards the direction of the adversary's will and force other (victim) participants to make negligible contributions, losing rewards that are usually offered regarding the importance of their contributions in federated learning scenarios. We conduct a systematic study on ADIs by first proving their existence in typical VFL systems. We then propose gradient-based methods to synthesize ADIs of various formats and exploit common VFL systems. We further launch greybox fuzz testing, guided by the saliency score of ``victim'' participants, to perturb adversary-controlled inputs and systematically explore the VFL attack surface in a privacy-preserving manner. We conduct an in-depth study on the influence of critical parameters and settings in synthesizing ADIs. Our study reveals new VFL attack opportunities, promoting the identification of unknown threats before breaches and building more secure VFL systems.
翻译:竖直联邦学习系统最近成为一个备受关注的概念,它可以处理分布在许多单独来源中的数据,而无需将其集中到一起。多个参与者以隐私保护的方式共同训练基于其本地数据的模型。迄今为止,竖直联邦学习已成为安全地在组织之间学习模型的实际解决方案,允许分享知识而不损害任何个人的隐私。尽管竖直联邦学习系统得到了繁荣的发展,但我们发现某些参与者的输入,称为对抗支配输入(ADIs),可以支配联合推断朝着对手的意愿的方向,并迫使其他(受害)参与者做出微不足道的贡献,失去通常提供的奖励,这些奖励通常与联邦学习场景中的其贡献的重要性有关。我们通过首先证明典型竖直联邦学习系统中存在ADIs来对ADIs进行了系统研究。然后,我们提出了基于梯度的方法,来合成各种格式的ADIs并利用常见的竖直联邦学习系统。我们还启动了带有“受害者”参与者的显著性评分的灰盒模糊测试,来扰乱对手控制的输入,并以隐私保护的方式系统地探索竖直联邦学习攻击面。我们对合成ADIs中关键参数和设置的影响进行了深入研究。我们的研究揭示了新的竖直联邦学习攻击机会,促进了在违规之前识别未知威胁并构建更安全的竖直联邦学习系统。