As a crucial building block in vertical Federated Learning (vFL), Split Learning (SL) has demonstrated its practice in the two-party model training collaboration, where one party holds the features of data samples and another party holds the corresponding labels. Such method is claimed to be private considering the shared information is only the embedding vectors and gradients instead of private raw data and labels. However, some recent works have shown that the private labels could be leaked by the gradients. These existing attack only works under the classification setting where the private labels are discrete. In this work, we step further to study the leakage in the scenario of the regression model, where the private labels are continuous numbers (instead of discrete labels in classification). This makes previous attacks harder to infer the continuous labels due to the unbounded output range. To address the limitation, we propose a novel learning-based attack that integrates gradient information and extra learning regularization objectives in aspects of model training properties, which can infer the labels under regression settings effectively. The comprehensive experiments on various datasets and models have demonstrated the effectiveness of our proposed attack. We hope our work can pave the way for future analyses that make the vFL framework more secure.
翻译:作为纵向联邦学习(vFL)中至关重要的构建模块,Split Learning (SL)已经展示了它在两方模型训练合作中的实践,其中一方拥有数据样本的特征,另一方拥有相应的标签。 这种方法声称是私有的,因为共享的信息仅为嵌入向量和梯度,而不是私有原始数据和标签。 但是,一些最近的研究表明,通过梯度可以泄露私有标签。这些现有攻击仅适用于分类设置,其中私有标签是离散的。在这项工作中,我们更进一步地研究了回归模型的情况下泄漏问题,其中私有标签是连续的数字(而不是分类中的离散标签)。这使得以前的攻击更难推断出连续的标签,因为它的输出范围是无限制的。为了解决这个限制,我们提出了一种新颖的基于学习的攻击,它将梯度信息和额外的学习正则化目标整合到模型训练属性方面,可以在回归设置下有效地推断标签。在各种数据集和模型上的全面实验表明了我们提出的攻击的有效性。我们希望我们的工作可以为未来使vFL框架更安全的分析铺平道路。