There is great demand for scalable, secure, and efficient privacy-preserving machine learning models that can be trained over distributed data. While deep learning models typically achieve the best results in a centralized non-secure setting, different models can excel when privacy and communication constraints are imposed. Instead, tree-based approaches such as XGBoost have attracted much attention for their high performance and ease of use; in particular, they often achieve state-of-the-art results on tabular data. Consequently, several recent works have focused on translating Gradient Boosted Decision Tree (GBDT) models like XGBoost into federated settings, via cryptographic mechanisms such as Homomorphic Encryption (HE) and Secure Multi-Party Computation (MPC). However, these do not always provide formal privacy guarantees, or consider the full range of hyperparameters and implementation settings. In this work, we implement the GBDT model under Differential Privacy (DP). We propose a general framework that captures and extends existing approaches for differentially private decision trees. Our framework of methods is tailored to the federated setting, and we show that with a careful choice of techniques it is possible to achieve very high utility while maintaining strong levels of privacy.
翻译:对可扩展、安全和高效的保密机器学习模式的需求很大,这些模式可以对分布式数据进行培训。虽然深深学习模式通常在集中的无保障环境中取得最佳结果,但在强制实行隐私和通信限制时,不同模式可以发挥优异作用。相反,XGBoost等以树为基础的方法因其高性能和易用性而引起极大关注;特别是,它们往往在表格数据上取得最先进的结果。因此,最近的一些工作侧重于将XGBoost等渐进式推进式决定树(GBDT)模式转化为联邦式环境,通过智能加密(HE)和安全多党制(MPC)等加密机制实现最佳结果。然而,这些模式并不总是提供正式的隐私保障,或考虑全面的超参数和执行环境。在这项工作中,我们在差异隐私(DP)下实施GBDT模式。我们提出了一个总框架,以捕捉和扩大现有的差异型私人决策树(GBDT)方法。我们的方法框架是适合联邦式环境的,我们表明,在认真选择保密性高水平的同时,可以实现高度的隐私。