As an emerging secure learning paradigm in leveraging cross-agency private data, vertical federated learning (VFL) is expected to improve advertising models by enabling the joint learning of complementary user attributes privately owned by the advertiser and the publisher. However, there are two key challenges in applying it to advertising systems: a) the limited scale of labeled overlapping samples, and b) the high cost of real-time cross-agency serving. In this paper, we propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations. We identify that: i) there are massive unlabeled overlapped data available in advertising systems, and ii) we can keep a balance between model performance and inference cost by decomposing the federated model. Specifically, we develop a self-supervised task Matched Pair Detection (MPD) to exploit the vertically partitioned unlabeled data and propose the Split Knowledge Distillation (SplitKD) schema to avoid cross-agency serving. Empirical studies on three industrial datasets exhibit the effectiveness of our methods, with the median AUC over all datasets improved by 0.86% and 2.6% in the local deployment mode and the federated deployment mode respectively. Overall, our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
翻译:作为利用跨机构私人数据的新兴安全学习范例,纵向联合学习(VFL)有望通过联合学习广告商和出版商私人拥有的辅助用户属性,改善广告模式,但将之应用于广告系统面临两大挑战:(a) 标签的重叠样本规模有限,以及(b) 实时跨机构服务的高昂成本。我们在此文件中提议建立一个半监督的分散蒸馏框架VFed-SSD,以缓解两个限制。我们确认:(一) 广告系统中有大量未加标记的重叠数据,以及(二) 通过拆解联合模式,我们可以在模型性能和推断成本之间保持平衡。具体地说,我们开发了一种自我监督的任务,即匹配的检测器(MPD),以利用纵向分割的无标签数据,并提出“分解知识蒸馏框架”(SplitKD),以避免跨机构服务。我们发现:关于三个工业数据集的大规模无标签重叠数据集展示了我们的方法的有效性,通过拆解的模型将模型和升级的部署模式分别改进了AUCS-BS-0.6%的当地部署成本和整个部署模式。