As an emerging secure learning paradigm in leveraging cross-silo private data, vertical federated learning (VFL) is expected to improve advertising models by enabling the joint learning of complementary user attributes privately owned by the advertiser and the publisher. However, the 1) restricted applicable scope to overlapped samples and 2) high system challenge of real-time federated serving have limited its application to advertising systems. In this paper, we advocate new learning setting Semi-VFL (Vertical Semi-Federated Learning) as a lightweight solution to utilize all available data (both the overlapped and non-overlapped data) that is free from federated serving. Semi-VFL is expected to perform better than single-party models and maintain a low inference cost. It's notably important to i) alleviate the absence of the passive party's feature and ii) adapt to the whole sample space to implement a good solution for Semi-VFL. Thus, we propose a carefully designed joint privileged learning framework (JPL) as an efficient implementation of Semi-VFL. Specifically, we build an inference-efficient single-party student model applicable to the whole sample space and meanwhile maintain the advantage of the federated feature extension. Novel feature imitation and ranking consistency restriction methods are proposed to extract cross-party feature correlations and maintain cross-sample-space consistency for both the overlapped and non-overlapped data. We conducted extensive experiments on real-world advertising datasets. The results show that our method achieves the best performance over baseline methods and validate its effectiveness in maintaining cross-view feature correlation.
翻译:作为利用跨筒仓私人数据的新兴安全学习范例,纵向联合学习(VFL)有望通过联合学习广告商和出版商私人拥有的补充用户属性,改善广告模式,但(1) 限制对重叠样本的适用范围,(2) 实时联合服务的系统挑战限制了对广告系统的应用,我们主张将半VFL(Vertical 半联联学习)作为一种新的学习设置,作为一种轻量级解决方案,用以利用所有现有数据(重叠和非重叠数据),而这些数据不需由联合服务提供。SEMVFL预计将比单方模式发挥更好的最佳效果,并保持较低的推论成本。这对(一) 缓解被动方的特性的缺失,以及(二) 适应整个样本空间的适应性空间,为半VFLF(VFL)实施一个良好的解决方案。因此,我们建议精心设计的联合特权学习框架(JPL),作为高效实施Sim-VFLL。 具体地说,我们为适用于整个样本模式的最佳不具有效率的单方学生模型,同时保持整个样本空间数据格式的升级的升级的升级优势。