As a crucial building block in vertical Federated Learning (vFL), Split Learning (SL) has demonstrated its practice in the two-party model training collaboration, where one party holds the features of data samples and another party holds the corresponding labels. Such method is claimed to be private considering the shared information is only the embedding vectors and gradients instead of private raw data and labels. However, some recent works have shown that the private labels could be leaked by the gradients. These existing attack only works under the classification setting where the private labels are discrete. In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification). This makes previous attacks harder to infer the continuous labels due to the unbounded output range. To address the limitation, we propose a novel learning-based attack that integrates gradient information and extra learning regularization objectives in aspects of model training properties, which can infer the labels under regression settings effectively. The comprehensive experiments on various datasets and models have demonstrated the effectiveness of our proposed attack. We hope our work can pave the way for future analyses that make the vFL framework more secure.
翻译:作为纵向联邦学习(VFL)的一个重要基石,Split Learning(SL)已经展示了它在双方示范培训合作中的做法,即一方持有数据样本的特征,另一方持有相应的标签。考虑到共享信息只是嵌入矢量和梯度,而不是私人原始数据和标签,这种方法据称是私有的。然而,最近的一些工作表明,私人标签可能因梯度而泄漏。这些现有的攻击只能在私人标签离散的分类设置下起作用。在这项工作中,我们进一步研究回归模型中渗漏的情况,即私人标签是连续数字(而不是分类中的离散标签),私人标签是连续数字。由于共享信息只是嵌入矢量和梯度,而不是私人原始数据和标签,因此,这种方法据称是私有的。为了应对这些限制,我们提议一种新的基于学习的攻击,将梯度信息与在模型培训属性方面的额外学习规范目标结合起来,这可以有效地推断回归环境中的标签。在各种数据集和模型上的全面试验证明了我们拟议攻击的有效性。我们希望我们的工作能够为未来的框架铺平道路。