An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection.
翻译:绝大多数拟议办法属于两类之一:(1) 对预测模型实行算法公正限制,(2) 收集更多的培训样本;最近,在这两个类别交汇处,制定了在公平制约下积极学习的方法;然而,拟议的减少偏向战略通常忽略了所观察到的标签中存在的偏差;在这项工作中,我们研究在存在标签偏差的情况下积极数据收集战略的公平考虑;我们首先概述在受监督的学习系统中不同类型的标签偏见;然后,我们从经验上表明,在忽略标签偏差时,收集更多的数据可能会加剧偏差,并强加依赖数据收集过程中所观察到的标签的公平限制可能无法解决问题;我们的结果表明,采用模型试图减少单一类型的偏差而忽略其他标签的做法会产生意外后果,强调明确区分公平认知算法旨在解决的偏差类型的重要性,并强调在数据收集过程中忽视标签偏差的风险。