Prior work has shown that Visual Recognition datasets frequently underrepresent bias groups $B$ (\eg Female) within class labels $Y$ (\eg Programmers). This dataset bias can lead to models that learn spurious correlations between class labels and bias groups such as age, gender, or race. Most recent methods that address this problem require significant architectural changes or additional loss functions requiring more hyper-parameter tuning. Alternatively, data sampling baselines from the class imbalance literature (\eg Undersampling, Upweighting), which can often be implemented in a single line of code and often have no hyperparameters, offer a cheaper and more efficient solution. However, these methods suffer from significant shortcomings. For example, Undersampling drops a significant part of the input distribution while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class conditioned sampling method: Bias Mimicking. The method is based on the observation that if a class $c$ bias distribution, \ie $P_D(B|Y=c)$ is mimicked across every $c^{\prime}\neq c$, then $P_D(Y|B) = P_D(B)$. Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution without repeating samples. Consequently, Bias Mimicking improves underrepresented groups average accuracy of sampling methods by 3\% over four benchmarks while maintaining and sometimes improving performance compared to non sampling methods.
翻译:先前的工作表明,视觉识别数据集经常在类标签中低低地呈现偏差类别$B$(与女性相比) ($Y美元) 。这种数据集偏差可能导致模型在类标签和年龄、性别或种族等偏差群体之间发现虚假的关联。 解决这一问题的最近方法大多需要重大的建筑变化或额外损失功能,要求更高的参数调整。 或者,类不平衡文献的数据抽样基线(与下取样、加加权),通常可以在单行代码中执行,而且往往没有超参数,提供更便宜、更有效率的解决办法。然而,这些方法有重大缺陷。例如,下抽样减少了输入分布的一大部分,而过重抽样则造成重复。为了克服这些缺陷,我们采用了一个新的等级条件抽样方法:Bias Mimicking。 这种方法基于这样的观察,即如果一个等级的偏差分布值为美元, $_D(与美元=Y=c=creial_B serview 方法), 则在每一个 $_B 平均分配方法上进行不比额 =C_ =x sal a creal a legnistrueal cal a le a cass be le a lection_B legnistruemental_ bal a lemental lemental__ c_ legal lemental a lex leg leg lection________ c_ be lemental a lemental___ c_ lement sal a__ lemental a lemental_________ c_ c_ legmental a_ be legmental lection_____________ lemental_ lection sal_ be lemental________ lemental a lemental_____________ be lemental_____________________________B_B_bal a_ exememememememememem lemental a_________