Tabular data is prevalent in many high stakes domains, such as financial services or public policy. Gradient boosted decision trees (GBDT) are popular in these settings due to performance guarantees and low cost. However, in consequential decision-making fairness is a foremost concern. Despite GBDT's popularity, existing in-processing Fair ML methods are either inapplicable to GBDT, or incur in significant train time overhead, or are inadequate for problems with high class imbalance -- a typical issue in these domains. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we have to employ a "proxy-Lagrangian" formulation using smooth convex error rate proxies to enable gradient-based optimization. Our implementation shows an order of magnitude speedup in training time when compared with related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
翻译:图表数据在许多高利害关系领域十分普遍,如金融服务或公共政策。在这些环境中,由于绩效保障和低成本,逐渐增强的决策树(GBDT)受到欢迎。然而,随之而来的决策公平性是最令人关切的问题。尽管GBDT的受欢迎程度很高,但现有的处理中公平ML方法要么不适用于GBDT, 要么在相当长的列车程中造成间接费用,或者不足以解决高档次不平衡的问题 -- -- 这是这些领域的一个典型问题。我们提出了FairGBM,这是一个在公平性制约下培训GBD的双重高级学习框架,与不受限制的GBDT相比,对预测性业绩几乎没有影响。由于观察性公平性指标是无差别的,我们不得不使用“交替-Lagrangian”的配方,使用光滑的convex误率准轴来进行梯度优化。我们的实施表明,与相关工作相比,培训时间有一定的强度加速速度,这是促进现实世界从业人员广泛采用公平GBM的关键方面。