Gender biases are known to exist within large-scale visual datasets and can be reflected or even amplified in downstream models. Many prior works have proposed methods for mitigating gender biases, often by attempting to remove gender expression information from images. To understand the feasibility and practicality of these approaches, we investigate what $\textit{gender artifacts}$ exist within large-scale visual datasets. We define a $\textit{gender artifact}$ as a visual cue that is correlated with gender, focusing specifically on those cues that are learnable by a modern image classifier and have an interpretable human corollary. Through our analyses, we find that gender artifacts are ubiquitous in the COCO and OpenImages datasets, occurring everywhere from low-level information (e.g., the mean value of the color channels) to the higher-level composition of the image (e.g., pose and location of people). Given the prevalence of gender artifacts, we claim that attempts to remove gender artifacts from such datasets are largely infeasible. Instead, the responsibility lies with researchers and practitioners to be aware that the distribution of images within datasets is highly gendered and hence develop methods which are robust to these distributional shifts across groups.
翻译:已知在大型视觉数据集中存在性别偏见,可以在下游模型中反映甚至放大。许多先前的工作都提出了减少性别偏见的方法,往往试图从图像中去除性别表达信息。为了了解这些方法的可行性和实用性,我们调查大型视觉数据集中存在什么$\textit{性别人工工艺品$。我们将一个$\textit{性别工艺品$定义为与性别相关联的视觉提示,具体侧重于现代图像分类员可以学习的线索,并具有可解释的人类必然结果。通过我们的分析,我们发现性别艺术在COCO和OpenImagage数据集中普遍存在,从低层次信息(例如彩色频道的平均价值)到更高层次的图像构成(例如人们的面貌和位置),到处都有性别艺术,我们声称,试图从这类数据集中去除性别艺术的线索基本上是不可行的。相反,我们的责任在于研究人员和从业者,他们意识到这些图像的传播方式在性别群体中非常稳健。