As there is a growing interest in utilizing data across multiple resources to build better machine learning models, many vertically federated learning algorithms have been proposed to preserve the data privacy of the participating organizations. However, the efficiency of existing vertically federated learning algorithms remains to be a big problem, especially when applied to large-scale real-world datasets. In this paper, we present a fast, accurate, scalable and yet robust system for vertically federated random forest. With extensive optimization, we achieved $5\times$ and $83\times$ speed up over the SOTA SecureBoost model \cite{cheng2019secureboost} for training and serving tasks. Moreover, the proposed system can achieve similar accuracy but with favorable scalability and partition tolerance. Our code has been made public to facilitate the development of the community and the protection of user data privacy.
翻译:由于人们越来越有兴趣利用多种资源的数据来建立更好的机器学习模式,因此提出了许多纵向联合学习算法,以维护参与组织的数据隐私,但是,现有的纵向联合学习算法的效率仍然是一个大问题,特别是在应用到大规模真实世界数据集时。在本文中,我们提出了一个快速、准确、可扩展和稳健的纵向联合随机森林系统。通过广泛优化,我们实现了5美元和83美元,比SOTA Secure Boost 模型(cite{cheng2019 Securityboost ) 加快了培训和服务任务的速度。此外,拟议的系统可以达到相似的准确性,但具有有利的可扩展性和分区容忍性。我们的代码已经公开,以促进社区发展和保护用户数据隐私。