Image captioning is an important task for benchmarking visual reasoning and for enabling accessibility for people with vision impairments. However, as in many machine learning settings, social biases can influence image captioning in undesirable ways. In this work, we study bias propagation pathways within image captioning, focusing specifically on the COCO dataset. Prior work has analyzed gender bias in captions using automatically-derived gender labels; here we examine racial and intersectional biases using manual annotations. Our first contribution is in annotating the perceived gender and skin color of 28,315 of the depicted people after obtaining IRB approval. Using these annotations, we compare racial biases present in both manual and automatically-generated image captions. We demonstrate differences in caption performance, sentiment, and word choice between images of lighter versus darker-skinned people. Further, we find the magnitude of these differences to be greater in modern captioning systems compared to older ones, thus leading to concerns that without proper consideration and mitigation these differences will only become increasingly prevalent. Code and data is available at https://princetonvisualai.github.io/imagecaptioning-bias .
翻译:图像字幕是一项重要任务,用于为视觉推理制定基准,并使视力受损者能够无障碍地使用。然而,正如在许多机器学习环境中一样,社会偏见可以以不可取的方式影响图像字幕。在这项工作中,我们研究图像字幕中的偏向传播途径,特别侧重于COCO数据集。先前的工作分析在字幕中使用自动衍生的性别标签的性别偏见;在这里,我们使用手动说明来检查种族和交叉偏向。我们的第一个贡献是在获得IRB批准后,对被描绘者的感知性别和肤色28 315人进行说明。我们利用这些说明,比较手动和自动生成图像字幕中的种族偏见。我们在标题性能、情绪和字词选择方面表现出较轻和较黑皮肤人群的图像之间的差异。此外,我们发现现代字幕系统中这些差异的程度比较年长者要大,从而引起这样的关切,即如果不适当考虑和减轻这些差异,这些差异就会越来越普遍。我们可在https://priencetonliviai.github.io/imageing-beares。