As AI-based decision systems proliferate, their successful operationalization requires balancing multiple desiderata: predictive performance, disparity across groups, safeguarding sensitive group attributes (e.g., race), and engineering cost. We present a holistic framework for evaluating and contextualizing fairness interventions with respect to the above desiderata. The two key points of practical consideration are where (pre-, in-, post-processing) and how (in what way the sensitive group data is used) the intervention is introduced. We demonstrate our framework using a thorough benchmarking study on predictive parity; we study close to 400 methodological variations across two major model types (XGBoost vs. Neural Net) and ten datasets. Methodological insights derived from our empirical study inform the practical design of ML workflow with fairness as a central concern. We find predictive parity is difficult to achieve without using group data, and despite requiring group data during model training (but not inference), distributionally robust methods provide significant Pareto improvement. Moreover, a plain XGBoost model often Pareto-dominates neural networks with fairness interventions, highlighting the importance of model inductive bias.
翻译:随着基于大赦国际的决策系统的扩散,其成功实施需要平衡多种偏差:预测性业绩、不同群体之间的差异、保护敏感群体属性(例如种族)和工程成本。我们提出了一个整体框架,用以评价和结合上述偏差的公平干预。两个实际考虑的要点是(预先、内部、后处理)和如何(如何使用敏感群体数据)引入干预。我们使用关于预测性对等的彻底基准研究,展示了我们的框架;我们研究了两大类模型(XGBoost对神经网)和十个数据集的近400种方法差异。我们从我们的经验研究中得出的方法见解为以公平为核心关切的ML工作流程的实际设计提供了依据。我们发现,不使用群体数据,尽管在示范培训期间(但不能推断)要求群体数据,很难实现预测性对等,但分配性强的方法提供了显著的Pareto改进。此外,一个普通的XGBoost模型往往将神经网络与公平干预联系起来,突出了模式的偏向性偏差的重要性。