Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.
翻译:纵向联邦学习(FL)是一个新的范例,使具有同一数据样本非重叠属性的用户能够联合培训模型,而无需直接分享原始数据。然而,最近的工程表明,它仍然不足以防止培训过程或经过培训的模式中隐私泄漏。本文件侧重于研究垂直FL下的隐私保护树增殖算法。基于加密的现有解决方案涉及沉重的计算和通信间接费用,容易受到推断攻击。虽然基于地方差异隐私(LDP)的解决方案解决了上述问题,但导致经过培训的模式的准确性低。本文探索如何提高广泛部署的树增殖算法的准确性,以满足纵向FLL的差别隐私权。具体地说,我们引入了一个称为OpBoost的框架。三种订单保护树增殖算法满足了LDP的变异种,即远程LDP(dLDP),目的是淡化培训数据。我们优化了 dLDP的定义,并研究高效的采样分发方法,以进一步提高拟议算法的准确性。拟议的全面算法与我们经过培训的远端的通用性模型相比,提供了一种比较的保密性模型。