Despite impressive advances in object-recognition, deep learning systems' performance degrades significantly across geographies and lower income levels raising pressing concerns of inequity. Addressing such performance gaps remains a challenge, as little is understood about why performance degrades across incomes or geographies. We take a step in this direction by annotating images from Dollar Street, a popular benchmark of geographically and economically diverse images, labeling each image with factors such as color, shape, and background. These annotations unlock a new granular view into how objects differ across incomes and regions. We then use these object differences to pinpoint model vulnerabilities across incomes and regions. We study a range of modern vision models, finding that performance disparities are most associated with differences in texture, occlusion, and images with darker lighting. We illustrate how insights from our factor labels can surface mitigations to improve models' performance disparities. As an example, we show that mitigating a model's vulnerability to texture can improve performance on the lower income level. We release all the factor annotations along with an interactive dashboard to facilitate research into more equitable vision systems.
翻译:尽管深度学习系统在物体识别方面取得了令人瞩目的进展,但是它们的性能在地理和低收入水平下显著下降,引起了公平性的紧迫关注。解决这种性能差距仍然是一个挑战,因为人们很少了解性能在收入或地理位置上为什么会下降。为了朝这个方向迈出一步,我们对来自Dollar Street的图片进行了注释,这是一个流行的地理和经济多样化图像基准,将每个图像的颜色、形状和背景等因素标注出来。这些标注为我们提供了一个新的粒度级别,可以了解不同收入和地区之间物体的差异。然后,我们利用这些物体差异来找出在不同收入和地区中模型的漏洞。我们研究了一系列现代视觉模型,发现性能差距与纹理、遮挡和光线较暗的图像最相关。我们演示了如何从因素标签中获得见解,以提升模型的性能差距。例如,我们展示了如何减轻模型对纹理的漏洞,以改进较低收入水平上的性能。我们发布了所有因素标注以及一个交互式仪表板,以促进更公平的视觉系统研究。