Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender. To mitigate both types of gender biases, we propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases, correcting gender misclassification and changing gender-stereotypical words to more neutral ones.
翻译:图像描述模型已被认为在训练集中延续和放大有害的社会偏见。在这项工作中,我们旨在缓解图像描述模型中的性别偏见。尽管以前的工作通过强制模型关注人物来减少性别分类错误来解决这个问题,但它相反以预测正确的性别为代价生成性别刻板印象的词语。从这个观察中,我们假设影响图像描述模型的性别偏见有两种类型:1)利用上下文预测性别的偏见,以及2)由于性别而生成某些(常常是刻板印象的)单词的概率偏见。为了减轻这两种类型的性别偏见,我们提出了一个名为 LIBRA 的框架,从合成的有偏样本中学习,以减少这两种类型的偏见,纠正性别分类错误,将性别刻板印象的词语改为更为中性的词语。