Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (i) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (ii) we store the output model in a distributed manner in order to minimize information release, and (iii) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.
翻译:保护隐私的机器学习最近日益引起人们的注意,特别是随着各种隐私条例的生效。在这种情况下,联邦学习组织(FL)似乎为多个当事方之间的隐私保护联合建模提供了便利。尽管对许多联合算法进行了广泛研究,但在文献中仍然缺乏安全和实用的梯度树增殖模型(例如XGB ) 。在本文件中,我们的目标是在纵向联合学习环境下建立大规模安全的XGB。我们保证数据隐私有三个方面。具体地说,(一)我们采用安全的多党计算技术,以避免在培训期间泄露中间信息,(二)我们以分布方式储存产出模型,以尽量减少信息发布,以及(三)我们提供一种新的算法,用分布模型进行安全的XGB预测。此外,通过提出安全的调位协议,我们可以提高培训效率,并将框架规模扩大到大型数据集。我们在公共数据集和真实世界数据集上进行广泛的实验,结果显示,我们提议的XGB模型不仅提供了竞争性的准确性,而且还提供了实用性。