The measurement of bias in machine learning often focuses on model performance across identity subgroups (such as man and woman) with respect to groundtruth labels. However, these methods do not directly measure the associations that a model may have learned, for example between labels and identity subgroups. Further, measuring a model's bias requires a fully annotated evaluation dataset which may not be easily available in practice. We present an elegant mathematical solution that tackles both issues simultaneously, using image classification as a working example. By treating a classification model's predictions for a given image as a set of labels analogous to a bag of words, we rank the biases that a model has learned with respect to different identity labels. We use (man, woman) as a concrete example of an identity label set (although this set need not be binary), and present rankings for the labels that are most biased towards one identity or the other. We demonstrate how the statistical properties of different association metrics can lead to different rankings of the most "gender biased" labels, and conclude that normalized pointwise mutual information (nPMI) is most useful in practice. Finally, we announce an open-sourced nPMI visualization tool using TensorBoard.
翻译:测量机器学习中的偏差往往侧重于不同身份分组(如男女)在地面真实标签方面的模型性能,然而,这些方法并不直接衡量模型可能学到的关联性,例如标签和身份分组之间的关联性。此外,衡量模型的偏差需要一个完全附加说明的评价数据集,在实践中可能不容易获得。我们提出了一个优雅的数学解决方案,同时解决这两个问题,同时使用图像分类作为工作范例。我们通过处理一个分类模型,将给定图像的预测作为一组类似于一包单词的标签,对模型在不同身份标签方面学到的偏差进行排序。我们使用(男人、妇女)作为身份标签组的具体例子(尽管这一数据集不必是二进制的),并对最偏向于一种身份或另一种身份的标签进行排名。我们展示了不同协会指标的统计属性如何导致最“性别偏差”标签的不同排序,并得出结论,在实践中,统一点的相互信息(nPMI)最为有用。我们用一个公开的软件工具宣布一个开放的源代码。