Algorithmic fairness has emerged as an important consideration when using machine learning to make high-stakes societal decisions. Yet, improved fairness often comes at the expense of model accuracy. While aspects of the fairness-accuracy tradeoff have been studied, most work reports the fairness and accuracy of various models separately; this makes model comparisons nearly impossible without a model-agnostic metric that reflects the balance of the two desiderata. We seek to identify, quantify, and optimize the empirical Pareto frontier of the fairness-accuracy tradeoff. Specifically, we identify and outline the empirical Pareto frontier through Tradeoff-between-Fairness-and-Accuracy (TAF) Curves; we then develop a metric to quantify this Pareto frontier through the weighted area under the TAF Curve which we term the Fairness-Area-Under-the-Curve (FAUC). TAF Curves provide the first empirical, model-agnostic characterization of the Pareto frontier, while FAUC provides the first metric to impartially compare model families on both fairness and accuracy. Both TAF Curves and FAUC can be employed with all group fairness definitions and accuracy measures. Next, we ask: Is it possible to expand the empirical Pareto frontier and thus improve the FAUC for a given collection of fitted models? We answer affirmately by developing a novel fair model stacking framework, FairStacks, that solves a convex program to maximize the accuracy of model ensemble subject to a score-bias constraint. We show that optimizing with FairStacks always expands the empirical Pareto frontier and improves the FAUC; we additionally study other theoretical properties of our proposed approach. Finally, we empirically validate TAF, FAUC, and FairStacks through studies on several real benchmark data sets, showing that FairStacks leads to major improvements in FAUC that outperform existing algorithmic fairness approaches.
翻译:在利用机器学习来作出高公平性社会决策时,公平性已成为一个重要的考虑因素。然而,改善公平性往往以模型准确性为代价。虽然对公平性-准确性交易的各个方面进行了研究,但大多数工作都分别报告各种模型的公平和准确性;这使得模型比较几乎不可能在没有反映两种偏差平衡的模型-不可知性衡量标准的情况下进行。我们试图确定、量化和优化公平性-准确性交易的经验性Pareto前沿。具体地说,我们通过交易-公平性-公平性-公平性交易(TAF)曲线(TAF)确定和概述经验性Pareto边界经验性交易经验性交易的前沿;我们随后开发了一个指标性交易-准确性交易-公平性交易(Area-目前)曲线(FAUC)的加权性区域。TAFAF Curveys提供了对公平性交易性交易性交易性交易的首次经验性描述,而AFACC则提供了对模式性交易性关系进行公正性比较的首度指标,同时,我们不断将公平性-公平性-公平性关系定义和FAFAU的最近性交易性交易性交易性交易性交易性交易性交易性交易性交易性交易性能的收集。我们最后要求,我们用一个新的数据,我们用一个最新性能定义,我们用一个可能的方法,我们用一个方法来显示一个最新性交易性交易性交易性交易性定义和FAVU的更多数据。