Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN's predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.
翻译:尽管在标准测试条件下,在目标识别和其他任务方面表现令人印象深刻,但深层次的网络往往未能在标准测试条件下普遍推广超出分配范围(o.o.d.)的样本。这一缺陷的一个原因是,现代建筑往往依赖“短切”——表面特征,这些特征与类别相关,而不捕捉不同背景的更深的变异因素。现实世界概念往往具有复杂的结构,这种结构可以表面地不同,在一种情况下可以使最直观和最有希望的解决办法不向其它方普及。一种潜在的改进O.o.o.d.通用的方法是假设简单的解决办法不可能在各种情况下有效,避免这些简单的解决办法,因为我们称之为“短切”——与类别相关,而没有捕捉到不同背景的更深层差异。现实世界概念往往具有一种复杂的结构,这种结构可以因地而不同而有所差异,在某种情况下,LCN的预测可以在一种两阶段方法中使用一种最直观和最有希望的解决办法,鼓励高能力网络(HCN)以更深的不易变式的特性为基础,在一般情况下,我们可广泛地将LCN-10级的快速地将LCN用于对L.