Due to the rising concerns on privacy protection, how to build machine learning (ML) models over different data sources with security guarantees is gaining more popularity. Vertical federated learning (VFL) describes such a case where ML models are built upon the private data of different participated parties that own disjoint features for the same set of instances, which fits many real-world collaborative tasks. Nevertheless, we find that existing solutions for VFL either support limited kinds of input features or suffer from potential data leakage during the federated execution. To this end, this paper aims to investigate both the functionality and security of ML modes in the VFL scenario. To be specific, we introduce BlindFL, a novel framework for VFL training and inference. First, to address the functionality of VFL models, we propose the federated source layers to unite the data from different parties. Various kinds of features can be supported efficiently by the federated source layers, including dense, sparse, numerical, and categorical features. Second, we carefully analyze the security during the federated execution and formalize the privacy requirements. Based on the analysis, we devise secure and accurate algorithm protocols, and further prove the security guarantees under the ideal-real simulation paradigm. Extensive experiments show that BlindFL supports diverse datasets and models efficiently whilst achieves robust privacy guarantees.
翻译:由于对隐私保护的日益关切,如何在有安全保障的不同数据源上建立机器学习模型越来越受欢迎。纵向联合学习(VFL)描述了这样一个案例,即ML模型建立在拥有相同情况不连接特征的不同参与方的私人数据基础上,适合许多现实世界的协作任务。然而,我们发现,VFL的现有解决方案要么支持有限的输入特征,要么在联合执行期间可能出现数据泄漏。为此,本文件旨在调查VFL情景中ML模式的功能和安全性。具体地说,我们引入盲人FL,这是VFL培训和推断的新框架。首先,为了解决VFLL模式的功能问题,我们建议FL模式的混合源层将来自不同方的数据统一起来。各种特征可以得到联邦化源层的有效支持,包括密集、稀少、数字和绝对性特征。第二,我们仔细分析FL模式执行过程中的安全性,并正式确定隐私要求。基于分析,我们制定可靠和准确的算法协议,这是VLFL培训和推断的新框架。首先,我们建议采用FLF模型的功能,以便进一步证明具有高度可靠的安全性模型,同时验证。