When dealing with sensitive data in automated data-driven decision-making, an important concern is to learn predictors with high performance towards a class label, whilst minimising for the discrimination towards any sensitive attribute, like gender or race, induced from biased data. A few hybrid tree optimisation criteria exist that combine classification performance and fairness. Although the threshold-free ROC-AUC is the standard for measuring traditional classification model performance, current fair tree classification methods mainly optimise for a fixed threshold on both the classification task as well as the fairness metric. In this paper, we propose a compound splitting criterion which combines threshold-free (i.e., strong) demographic parity with ROC-AUC termed SCAFF -- Splitting Criterion AUC for Fairness -- and easily extends to bagged and boosted tree frameworks. Our method simultaneously leverages multiple sensitive attributes of which the values may be multicategorical or intersectional, and is tunable with respect to the unavoidable performance-fairness trade-off. In our experiments, we demonstrate how SCAFF generates models with performance and fairness with respect to binary, multicategorical, and multiple sensitive attributes.
翻译:在自动化数据驱动决策中处理敏感数据时,一个重要关切是学习高性能预测者如何对待等级标签,同时尽可能减少偏差数据对诸如性别或种族等任何敏感属性的歧视。存在一些混合树优化标准,将分类性能和公平性结合起来。虽然无门槛的ROC-AUC是衡量传统分类模型性能的标准,但目前的公平树分类方法主要是优化分类任务和公平性指标的固定阈值。在本文中,我们提出了一个复合分拆标准,将无门槛(即强)的人口均等与所谓的ROC-AUC SCAFF -- -- 为公平性分解的AUC -- -- 很容易延伸到包装和加装的树框架。我们的方法同时利用多种敏感属性,这些值可以是多级或交错的,并且与不可避免的业绩公平性交易和公平性交换有关。我们实验中,我们展示了SCFF如何生成与二进制、多级、多级和多重敏感属性的性能和公平性能模型。