The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.
翻译:渐进推动决策树(GBDT)是近年来各种任务流行的机器学习模式。 在本文中,我们研究如何提高GBDT的模型准确性,同时保持对差异隐私的有力保障。感知性和隐私预算是差别私人模式有效性的两个关键设计方面。 具有不同隐私的GBDT的现有解决方案由于过度松散的敏感性界限和无效的隐私预算分配(特别是在GBDT模式中的不同树)而蒙受了巨大的准确性损失。 放松敏感度导致更多的噪音以获得固定的隐私水平。 无效的隐私预算分配会加剧准确性损失,特别是在树木数量大的情况下。 因此,我们提出一个新的GBDT培训算法, 实现更严格的敏感度界限和更有效的噪音分配。 具体地说,通过调查梯度特性和GBDTT中每棵树的贡献,我们建议调整控制每棵树的培训数据梯度的梯度,以收紧敏感度。 此外,我们设计了一个新的提法框架,在树间分配隐私预算,以便进一步降低准确性损失的基线。