Vertical federated learning (VFL), a variant of Federated Learning (FL), has recently drawn increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to achieve better model performance. However, conventional VFL methods may run into data deficiency as they exploit only aligned and labeled samples (belonging to different parties), leaving often the majority of unaligned and unlabeled samples unused. The data deficiency hampers the effort of the federation. In this work, we propose a Federated Hybrid Self-Supervised Learning framework, named FedHSSL, that utilizes cross-party views (i.e., dispersed features) of samples aligned among parties and local views (i.e., augmentation) of unaligned samples within each party to improve the representation learning capability of the VFL joint model. FedHSSL further exploits invariant features across parties to boost the performance of the joint model through partial model aggregation. FedHSSL, as a framework, can work with various representative SSL methods. We empirically demonstrate that FedHSSL methods outperform baselines by large margins. We provide an in-depth analysis of FedHSSL regarding label leakage, which is rarely investigated in existing self-supervised VFL works. The experimental results show that, with proper protection, FedHSSL achieves the best privacy-utility trade-off against the state-of-the-art label inference attack compared with baselines. Code is available at \url{https://github.com/jorghyq2016/FedHSSL}.
翻译:暂无翻译