Vertical federated learning (VFL) aims to train models from cross-silo data with different feature spaces stored on different platforms. Existing VFL methods usually assume all data on each platform can be used for model training. However, due to the intrinsic privacy risks of federated learning, the total amount of involved data may be constrained. In addition, existing VFL studies usually assume only one platform has task labels and can benefit from the collaboration, making it difficult to attract other platforms to join in the collaborative learning. In this paper, we study the platform collaboration problem in VFL under privacy constraint. We propose to incent different platforms through a reciprocal collaboration, where all platforms can exploit multi-platform information in the VFL framework to benefit their own tasks. With limited privacy budgets, each platform needs to wisely allocate its data quotas for collaboration with other platforms. Thereby, they naturally form a multi-party game. There are two core problems in this game, i.e., how to appraise other platforms' data value to compute game rewards and how to optimize policies to solve the game. To evaluate the contributions of other platforms' data, each platform offers a small amount of "deposit" data to participate in the VFL. We propose a performance estimation method to predict the expected model performance when involving different amount combinations of inter-platform data. To solve the game, we propose a platform negotiation method that simulates the bargaining among platforms and locally optimizes their policies via gradient descent. Extensive experiments on two real-world datasets show that our approach can effectively facilitate the collaborative exploitation of multi-platform data in VFL under privacy restrictions.
翻译:垂直联盟学习( VFL) 旨在从不同平台存储的不同功能空间的跨SIlo数据中培训模型。 现有的 VFL 方法通常假定每个平台上的所有数据都可以用于模型培训。 但是,由于联合学习的内在隐私风险,所涉数据的总量可能会受到限制。 此外, 现有的 VFL 研究通常假设只有一个平台有任务标签, 并且能够从合作中受益, 从而难以吸引其他平台加入协作学习。 在本文中, 我们研究VFL 在隐私限制下, VFL 的平台协作问题。 我们提议通过互惠合作, 所有平台都可以利用每个平台上的所有数据来用于模型上的所有数据。 由于隐私预算有限, 每个平台都需要明智地分配与其他平台合作的数据配额。 因此, 他们自然会形成多党游戏。 在这个游戏中有两个核心问题, 即如何评估其他平台的数据价值, 以计算游戏的递增率, 以及如何在游戏中优化政策。 为了评估其他平台的数据的贡献, 并且所有平台都可以在 VFL 框架中利用多平台的多平台中利用多平台限制信息信息来受益。 每个平台提供少量的预估值数据。 当我们提出“ 预估数据时, 我们用不同的工具来演示时, 我们用不同的工具来显示一种预算数据。