Vertical federated learning (vFL) has gained much attention and been deployed to solve machine learning problems with data privacy concerns in recent years. However, some recent work demonstrated that vFL is vulnerable to privacy leakage even though only the forward intermediate embedding (rather than raw features) and backpropagated gradients (rather than raw labels) are communicated between the involved participants. As the raw labels often contain highly sensitive information, some recent work has been proposed to prevent the label leakage from the backpropagated gradients effectively in vFL. However, these work only identified and defended the threat of label leakage from the backpropagated gradients. None of these work has paid attention to the problem of label leakage from the intermediate embedding. In this paper, we propose a practical label inference method which can steal private labels effectively from the shared intermediate embedding even though some existing protection methods such as label differential privacy and gradients perturbation are applied. The effectiveness of the label attack is inseparable from the correlation between the intermediate embedding and corresponding private labels. To mitigate the issue of label leakage from the forward embedding, we add an additional optimization goal at the label party to limit the label stealing ability of the adversary by minimizing the distance correlation between the intermediate embedding and corresponding private labels. We conducted massive experiments to demonstrate the effectiveness of our proposed protection methods.
翻译:近些年来,一些最近的工作显示,VFL容易出现隐私泄漏问题,即使参与者之间只交流前置中间嵌入(而不是原始特性)和反传换梯度(而不是原始标签),VFL也容易出现隐私渗漏问题。由于原始标签通常包含高度敏感的信息,因此建议最近开展一些工作,防止标签从反传变梯度中漏出,有效防止反传变梯度在VFL中漏出。然而,这些工作仅查明并捍卫标签从反传变梯度渗漏的威胁。这些工作中无一关注中间嵌入的标签渗漏问题。在本文件中,我们提出一个实用的标签推断方法,可以从共同的中间嵌入中有效窃取私人标签(而不是原始标签标签标签标签标签标签标签),尽管采用了某些现有的保护方法,如标签隐私差异和梯度渗透性信息。标签袭击的效力与中间嵌入和对应的私人标签之间的相互关系密不可分不开。为了减轻标签从前嵌入的标签渗漏的问题,我们提议在贴入时,再展示一个移动标签的相互更新能力,我们在标签方面进行大规模升级。