In image classification, "debiasing" aims to train a classifier to be less susceptible to dataset bias, the strong correlation between peripheral attributes of data samples and a target class. For example, even if the frog class in the dataset mainly consists of frog images with a swamp background (i.e., bias-aligned samples), a debiased classifier should be able to correctly classify a frog at a beach (i.e., bias-conflicting samples). Recent debiasing approaches commonly use two components for debiasing, a biased model $f_B$ and a debiased model $f_D$. $f_B$ is trained to focus on bias-aligned samples (i.e., overfitted to the bias) while $f_D$ is mainly trained with bias-conflicting samples by concentrating on samples which $f_B$ fails to learn, leading $f_D$ to be less susceptible to the dataset bias. While the state-of-the-art debiasing techniques have aimed to better train $f_D$, we focus on training $f_B$, an overlooked component until now. Our empirical analysis reveals that removing the bias-conflicting samples from the training set for $f_B$ is important for improving the debiasing performance of $f_D$. This is due to the fact that the bias-conflicting samples work as noisy samples for amplifying the bias for $f_B$ since those samples do not include the bias attribute. To this end, we propose a simple yet effective data sample selection method which removes the bias-conflicting samples to construct a bias-amplified dataset for training $f_B$. Our data sample selection method can be directly applied to existing reweighting-based debiasing approaches, obtaining consistent performance boost and achieving the state-of-the-art performance on both synthetic and real-world datasets.
翻译:在图像分类中, “ 下降偏差” 的目的是训练一个分类器, 使其不易受到数据偏差的偏差偏差, 数据样本的外围属性与目标类之间的紧密关联性关系。 例如, 即使数据集中的青蛙类主要由具有沼泽背景的青蛙图像组成( 偏差对比样本), 降低偏差的分类器应该能够在海滩( 即偏差冲突样本) 正确分类青蛙, 从而在沙滩( 即 偏差冲突样本 ) 。 最近的偏差方法通常使用两个组成部分来降低偏差, 一个偏差的模型$f_ B$ 和一个偏差的模型 $f_ D$ 。 $ _ B$ 被训练专注于偏差的样本( 过度适应偏差), $f_ D$ 被训练为偏差的样本, 导致美元 偏差的偏差率 偏差的偏差分析 。 州- 州- 标定的降解 方法的目的是更好地训练 $ D$ 。 我们现在将这种偏差的选取 的选取结果 数据 用于测试中的 推移性能 。