Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep learning model trained on real-world medical imaging data, we verify our claim empirically and argue that choice of metric for model comparison, especially those that do not take variability into account, can significantly bias model selection outcomes.
翻译:分析公平性强调了有偏向的数据在自动决策结果中的作用。最近,对偏见来源的注意有所转变,这种偏向来源在ML管道的其他阶段涉及公平性。我们认为,这种偏向的一个来源,即人类在模式选择中的偏好,就其作用而言,在人口群体的不同影响方面,仍未得到充分探讨。我们利用在现实世界医学成像数据方面受过培训的深层次学习模式,根据经验核实了我们的主张,并辩称选择模型比较指标,特别是那些没有考虑到变异性的指标,可能在很大程度上偏向模式选择结果。