Federated learning (FL) enables distributed devices to jointly train a shared model while keeping the training data local. Different from the horizontal FL (HFL) setting where each client has partial data samples, vertical FL (VFL), which allows each client to collect partial features, has attracted intensive research efforts recently. In this paper, we identified two challenges that state-of-the-art VFL frameworks are facing: (1) some works directly average the learned feature embeddings and therefore might lose the unique properties of each local feature set; (2) server needs to communicate gradients with the clients for each training step, incurring high communication cost that leads to rapid consumption of privacy budgets. In this paper, we aim to address the above challenges and propose an efficient VFL with multiple linear heads (VIM) framework, where each head corresponds to local clients by taking the separate contribution of each client into account. In addition, we propose an Alternating Direction Method of Multipliers (ADMM)-based method to solve our optimization problem, which reduces the communication cost by allowing multiple local updates in each step, and thus leads to better performance under differential privacy. We consider various settings including VFL with model splitting and without model splitting. For both settings, we carefully analyze the differential privacy mechanism for our framework. Moreover, we show that a byproduct of our framework is that the weights of learned linear heads reflect the importance of local clients. We conduct extensive evaluations and show that on four real-world datasets, VIM achieves significantly higher performance and faster convergence compared with state-of-the-arts. We also explicitly evaluate the importance of local clients and show that VIM enables functionalities such as client-level explanation and client denoising.
翻译:联邦学习(FL) 使得分布式设备能够联合培训共享模式,同时保持培训数据本地化。与每个客户都有部分数据样本的横向FL(HFL)设置不同,垂直FL(VFL)让每个客户收集部分特征,最近吸引了大量的研究工作。在本文件中,我们确定了最先进的VFL框架所面临的两个挑战:(1) 一些直接平均了所学特征的嵌入,因此可能丧失每个本地功能的独特性能;(2) 服务器需要与每个培训步骤的客户沟通梯度,导致高昂的通信成本,导致快速消费隐私预算。在本文件中,我们旨在应对上述挑战,并提议一个高效的VFLL(VFL)框架,让每个客户收集部分特征,从而收集部分特征特征,从而让每个客户收集部分特征,从而与当地客户匹配。 此外,我们提议一个基于多功能化方向的方法来解决我们的优化问题,通过允许多次更新本地状态更新降低通信成本,从而导致在隐私水平下实现更好的业绩。我们考虑各种环境,包括VFLM(VI) 大幅改变客户的运行模式。