The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in shaping the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.
翻译:机器学习算法从数据中学习模式的无与伦比的能力也使得它们能够吸收内在的偏见。一个有偏见的模式可以做出对社会某些群体造成极大伤害的决定。许多工作都用于衡量静态 ML环境中的不公平现象,而不是动态的、有表现的预测,大多数现实世界都使用这些案例。在后一种情况下,预测模型本身在影响数据分布方面发挥着关键作用。然而,人们很少注意将不公平与这些相互作用联系起来。因此,为了进一步理解这些环境中的不公平现象,我们建议一种分类法来描述数据中的偏见,并研究由模型行为形成的情况。我们以真实世界账户开设欺诈检测案例研究为例,研究两种典型的偏差对业绩和公平性的危险:分布变化和选择性标签问题。