Federated learning allows multiple parties to build machine learning models collaboratively without exposing data. In particular, vertical federated learning (VFL) enables participating parties to build a joint machine learning model based upon distributed features of aligned samples. However, VFL requires all parties to share a sufficient amount of aligned samples. In reality, the set of aligned samples may be small, leaving the majority of the non-aligned data unused. In this article, we propose Federated Cross-view Training (FedCVT), a semi-supervised learning approach that improves the performance of the VFL model with limited aligned samples. More specifically, FedCVT estimates representations for missing features, predicts pseudo-labels for unlabeled samples to expand the training set, and trains three classifiers jointly based upon different views of the expanded training set to improve the VFL model's performance. FedCVT does not require parties to share their original data and model parameters, thus preserving data privacy. We conduct experiments on NUS-WIDE, Vehicle, and CIFAR10 datasets. The experimental results demonstrate that FedCVT significantly outperforms vanilla VFL that only utilizes aligned samples. Finally, we perform ablation studies to investigate the contribution of each component of FedCVT to the performance of FedCVT.
翻译:联邦学习允许多个方在不暴露数据的情况下合作建立机器学习模型; 特别是,纵向联合学习(VFL)使参与方能够根据统一样本的分布特征建立一个联合机器学习模型; 然而,VFL要求所有各方分享足够数量的校准样本; 事实上,一组校准样本可能规模较小,使大多数不结盟数据无法使用; 在本篇文章中,我们提议采用半监督的学习方法,即联邦跨文化培训(FedCVT),一种半监督的学习方法,用有限的统一样本改进VFLF模式的性能; 更具体地说,FedCVT估计缺失特征的显示,预测未贴标签样本的假标签以扩大培训成套,并根据扩大培训成套不同观点共同培训三个分类人员,以提高VFLF模式的性能; FedCVT不要求缔约方分享原始数据和模型参数,从而维护数据隐私; 我们进行关于NUS-WIDE、车辆和CIFAR10数据集的实验。 实验结果显示,FDVT显著地取代了FC Vanilla VLLA的每部分,我们只对使用FFFT的实绩分析。