Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting is a method that can mitigate such bias in model predictions by assigning a weight to each data point used during model training. In this paper, we compare three methods for generating these weights: (1) evolving them using a Genetic Algorithm (GA), (2) computing them using only dataset characteristics, and (3) assigning equal weights to all data points. Model performance under each strategy was evaluated using paired predictive and fairness metrics, which also served as optimization objectives for the GA during evolution. Specifically, we used two predictive metrics (accuracy and area under the Receiver Operating Characteristic curve) and two fairness metrics (demographic parity difference and subgroup false negative fairness). Using experiments on eleven publicly available datasets (including two medical datasets), we show that evolved sample weights can produce models that achieve better trade-offs between fairness and predictive performance than alternative weighting methods. However, the magnitude of these benefits depends strongly on the choice of optimization objectives. Our experiments reveal that optimizing with accuracy and demographic parity difference metrics yields the largest number of datasets for which evolved weights are significantly better than other weighting strategies in optimizing both objectives.
翻译:基于现实世界数据训练的机器学习模型可能无意中做出带有偏见的预测,从而对边缘化群体产生负面影响。重加权是一种通过在模型训练期间为每个数据点分配权重来缓解此类预测偏差的方法。本文比较了三种生成这些权重的方法:(1)使用遗传算法进化权重,(2)仅基于数据集特征计算权重,(3)为所有数据点分配相等权重。每种策略下的模型性能均通过配对的预测指标和公平性指标进行评估,这些指标也作为遗传算法进化过程中的优化目标。具体而言,我们使用了两种预测指标(准确率和受试者工作特征曲线下面积)和两种公平性指标(人口统计均等差异和子组假阴性公平性)。通过对十一个公开数据集(包括两个医学数据集)的实验,我们证明进化样本权重能够产生比替代加权方法在公平性与预测性能之间取得更好权衡的模型。然而,这些优势的程度在很大程度上取决于优化目标的选择。实验表明,使用准确率和人口统计均等差异指标进行优化时,进化权重在同时优化这两个目标方面显著优于其他加权策略的数据集数量最多。