Image classifiers often rely overly on peripheral attributes that have a strong correlation with the target class (i.e., dataset bias) when making predictions. Recently, a myriad of studies focus on mitigating such dataset bias, the task of which is referred to as debiasing. However, these debiasing methods often have inconsistent experimental settings (e.g., datasets and neural network architectures). Additionally, most of the previous studies in debiasing do not specify how they select their model parameters which involve early stopping and hyper-parameter tuning. The goal of this paper is to standardize the inconsistent experimental settings and propose a consistent model parameter selection criterion for debiasing. Based on such unified experimental settings and model parameter selection criterion, we build a benchmark named DebiasBench which includes five datasets and seven debiasing methods. We carefully conduct extensive experiments in various aspects and show that different state-of-the-art methods work best in different datasets, respectively. Even, the vanilla method, the method with no debiasing module, also shows competitive results in datasets with low bias severity. We publicly release the implementation of existing debiasing methods in DebiasBench to encourage future researchers in debiasing to conduct fair comparisons and further push the state-of-the-art performances.
翻译:图像分类者在作出预测时往往过分依赖与目标类别(即,数据集偏差)密切相关的外围属性。最近,大量研究侧重于减少这种数据集偏差,其任务被称为偏差。然而,这些偏差方法往往有不一致的实验设置(例如,数据集和神经网络结构)。此外,以往关于偏差的大多数研究没有具体说明他们如何选择其涉及早期停止和超参数调整的模型参数。本文件的目标是使不一致的实验设置标准化,并提出一个一致的脱偏差示范参数选择标准。根据这种统一的试验设置和模型参数选择标准,我们建立一个称为Debias Bench的基准,其中包括五个数据集和七个偏差方法。我们仔细进行了各方面的广泛实验,并表明不同状态的各种方法在不同数据集中分别最有效。即使采用范尼拉方法,没有进一步偏差模块,也显示在数据设置方面有竞争力,同时显示在降低研究人员的公正性方面,在采用较低的程度方面,我们公开披露了在降低性别严重程度方面,目前采用不同的先进方法。