Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance losses while taking considerably longer to train. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we propose smooth convex error rate proxies for common fairness criteria, enabling gradient-based optimization using a ``proxy-Lagrangian'' formulation. Our implementation shows an order of magnitude speedup in training time relative to related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
翻译:图表数据在金融服务或公共政策等许多高取量领域很普遍。 渐进推动决策树(GBDT)由于可缩放性、性能和低培训成本,在这些环境中很受欢迎。 虽然这些领域的公平性是一个首要关切问题,但现有的处理中公平ML方法要么与GBDT不相容,要么在培训时间长得多的情况下造成很大的性能损失。 我们提出了FairGBM,这是一个在公平性制约下培训GBDT的双重性能学习框架,与未受限制的GBDT相比,对预测性能几乎没有影响。由于观察性公平性指标是不可区分的,我们建议为共同的公平标准采用平滑的convex误差率代理标准,从而使用“proxy-Lagrangeian”的配方,使基于梯度的优化得以实现。我们的实施显示培训时间相对于相关工作而言的高度速度,这是促进现实世界从业人员广泛采用FairGBM的关键方面。</s>