Federated learning (FL) is an emerging privacy preserving machine learning protocol that allows multiple devices to collaboratively train a shared global model without revealing their private local data. Non-parametric models like gradient boosting decision trees (GBDT) have been commonly used in FL for vertically partitioned data. However, all these studies assume that all the data labels are stored on only one client, which may be unrealistic for real-world applications. Therefore, in this work, we propose a secure vertical FL framework, named PIVODL, to train GBDT with data labels distributed on multiple devices. Both homomorphic encryption and differential privacy are adopted to prevent label information from being leaked through transmitted gradients and leaf values. Our experimental results show that both information leakage and model performance degradation of the proposed PIVODL are negligible.
翻译:联邦学习(FL)是一个新兴的隐私保护机器学习协议,它允许多种设备在不披露其私人当地数据的情况下合作培训一个共享的全球模型。在FL中,垂直分割数据通常使用非参数模型,如梯度推动决定树(GBDT),但所有这些研究都假定所有数据标签只储存在一个客户端上,这可能对现实世界的应用来说不现实。因此,在这项工作中,我们提议一个名为PIVODL的安全的垂直FL框架,对GBDT进行培训,在多个设备上发布数据标签。采用同形态加密和差异隐私两种方法,防止标签信息通过传输的梯度和叶值泄漏。我们的实验结果表明,拟议PIVODL的信息泄漏和模式性能退化都微不足道。