Demographic biases in source datasets have been shown as one of the causes of unfairness and discrimination in the predictions of Machine Learning models. One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets. In this paper, we study the measurement of these biases by reviewing the existing metrics, including those that can be borrowed from other disciplines. We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics. To illustrate the utility of our framework, and to further understand the practical characteristics of the metrics, we conduct a case study of 20 datasets used in Facial Emotion Recognition (FER), analyzing the biases present in them. Our experimental results show that many metrics are redundant and that a reduced subset of metrics may be sufficient to measure the amount of demographic bias. The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models. The code is available at https://github.com/irisdominguez/dataset_bias_metrics.
翻译:源数据集中的人口统计偏差是导致机器学习模型预测中的不公平和歧视的原因之一。人口统计偏差最突出的一种类型是数据集中的群体表示存在统计不平衡。本文研究了这些偏差的度量方法,包括可以从其他专业借鉴的度量方法。我们制定了一个度量方法分类的分类法,为选择适当的方法提供了实用指南。为了说明我们框架的实用性,并进一步了解度量方法的实际特点,我们对面部表情识别(Facial Emotion Recognition, FER)中使用的20个数据集进行了案例研究,分析了其中存在的偏差。我们的实验结果表明,许多度量方法是冗余的,一组简化的度量方法可能已足以度量人口统计偏差的程度。论文为AI和相关领域的研究人员提供了有价值的见解,帮助其减少数据集偏差,提高AI模型的公平性和准确性。代码可在 https://github.com/irisdominguez/dataset_bias_metrics 上找到。