Simplicity bias is the concerning tendency of deep networks to over-depend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This causes biased, incorrect model predictions in many real-world applications, exacerbated by incomplete training data containing spurious feature-label correlations. We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve. We aim to automatically identify and suppress easily-computable spurious features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and real-world images, and report substantial gains on many real-world debiasing benchmarks (11.4% relative gain on Imagenet-A; 3.2% on BAR, etc). Crucially, we outperform many baselines that incorporate knowledge about known spurious or biased attributes, despite our method not using any such information. We believe that our feature sieve work opens up exciting new research directions in automated adversarial feature extraction & representation learning for deep networks.
翻译:简单偏差是指深层网络过度依赖简单、弱小预测特征的倾向,而排除了更强、更复杂的特征。这导致许多真实世界应用中存在偏差、不正确的模型预测,而包含虚假特征标签相关关系的培训数据不完整,使这些预测更加复杂。我们建议了一种直接的干预方法,以解决DNN(我们称之为特征筛选)中的简单偏差。我们的目标是自动识别和抑制网络下层中容易理解的虚假特征,从而使更高的网络级别能够提取和利用更丰富、更有意义的表达方式。我们提供了具体证据,证明控制数据集和真实世界图像中相关特征的这种差异抑制和强化,并报告了许多真实世界偏差基准(图象网-A的相对收益11.4%;BAR(3.2%)等所取得的巨大收益。 显而易见,尽管我们的方法不使用任何此类信息,但我们超越了许多包含已知的虚假或偏差属性知识的基线。我们相信,我们的特征筛选工作开启了自动化对抗特征特征提取和代表式特征提取和深网络学习的新研究方向。