Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep convolutional neural networks (DCNNs) often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and downweight them, which we refer to as the too-good-to-be-true prior. We implement this inductive bias in a two-stage approach that uses predictions from a low-capacity network (LCN) to inform the training of a high-capacity network (HCN). Since the shallow architecture of the LCN can only learn surface relationships, which includes shortcuts, we downweight training items for the HCN that the LCN can master, thereby encouraging the HCN to rely on deeper invariant features that should generalize broadly. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.
翻译:尽管在标准测试条件下的物体识别和其他任务方面表现令人印象深刻,但深层神经神经网络(DCNNNs)尽管在标准测试条件下在目标识别和其他任务方面表现出色,但往往无法概括到超出分配(o.o.d.d.)样本的样本。这一缺陷的一个原因是,现代建筑往往依赖“shortcuts”——表面特征,这些特征与类别相关,而不捕捉不同背景的更深差异。现实世界概念往往拥有一个复杂的结构,这种结构在各种背景中可能存在表面差异,使一种情况下最直观和最有希望的解决办法无法推广到其他。改进o.o.d.一般化的一个潜在途径是假设简单的解决方案不可能在各种背景中有效,而且其下调也不可能有效。 我们称之为“shortcut-shortcuts”——与类别相关,我们用低能力网络(LCN)的预测来为高能力网络的培训提供信息。由于LCN的浅层结构只能学习表面关系,包括捷径,我们无法在HCN一般的近距离上为HCN升级的快速培训项目,因此,我们可以将LCN系统使用更深层次的CFAR数据。