Federated Learning (FL) is an approach to collaboratively train a model across multiple parties without sharing data between parties or an aggregator. It is used both in the consumer domain to protect personal data as well as in enterprise settings, where dealing with data domicile regulation and the pragmatics of data silos are the main drivers. While gradient boosted tree implementations such as XGBoost have been very successful for many use cases, its federated learning adaptations tend to be very slow due to using cryptographic and privacy methods and have not experienced widespread use. We propose the Party-Adaptive XGBoost (PAX) for federated learning, a novel implementation of gradient boosting which utilizes a party adaptive histogram aggregation method, without the need for data encryption. It constructs a surrogate representation of the data distribution for finding splits of the decision tree. Our experimental results demonstrate strong model performance, especially on non-IID distributions, and significantly faster training run-time across different data sets than existing federated implementations. This approach makes the use of gradient boosted trees practical in enterprise federated learning.
翻译:联邦学习联盟(FL)是一种在各方之间不共享数据或不共享数据或聚合器的情况下合作培训多方模式的方法,既用于消费者领域保护个人数据,也用于企业环境,主要驱动因素是处理数据户籍调节和数据仓的实用性;虽然诸如XGBoost等梯度增强的树执行在许多使用案例中非常成功,但其联合学习适应由于使用加密和隐私方法而往往非常缓慢,而且没有广泛使用。我们建议缔约方-Adapitive XGBoost(PAX)进行联合学习,这是利用缔约方适应性直方图汇总方法进行梯度增强的一种新做法,不需要数据加密。它构建了数据分配的代言,以寻找决策树的分割。我们的实验结果表明,模型性能很强,特别是在非IID分布方面,而且比现有的联合实施系统大大加快了不同数据集的培训运行时间。这一方法使得在企业联合学习中使用梯度增强树变得实用。