The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context of federated learning (FL). Existing works on federated XGBoost in the horizontal setting rely on the sharing of gradients, which induce per-node level communication frequency and serious privacy concerns. To alleviate these problems, we develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and communication efficiency by making the learning rates of the aggregated tree ensembles learnable. We conduct extensive evaluations on various classification and regression datasets, showing our approach achieves performance comparable to the state-of-the-art method and effectively improves communication efficiency by lowering both communication rounds and communication overhead by factors ranging from 25x to 700x.
翻译:隐私敏感的分散式数据集以及梯度提升树(XGBoost)在表格数据上的鲁棒性,引发了在联邦学习(FL)中训练XGBoost的需要。现有的横向联邦XGBoost方案依赖于共享梯度,这种方式引发了单节点级别的通信频率和严重的隐私问题。为了缓解这些问题,我们开发了一种创新框架,用于解决横向联邦XGBoost,该框架不依赖于共享梯度,通过将聚合后的树集合的学习速率可学习来同时提高隐私和通信效率。我们在各种分类和回归数据集上进行了广泛的评估,结果表明,我们的方法可以达到与最先进方法相当的性能,并有效地提高了通信效率,将通信轮数和通信开销降低了25倍至700倍的范围。