Object recognition and viewpoint estimation lie at the heart of visual understanding. Recent works suggest that convolutional neural networks (CNNs) fail to generalize to out-of-distribution (OOD) category-viewpoint combinations, ie. combinations not seen during training. In this paper, we investigate when and how such OOD generalization may be possible by evaluating CNNs trained to classify both object category and 3D viewpoint on OOD combinations, and identifying the neural mechanisms that facilitate such OOD generalization. We show that increasing the number of in-distribution combinations (ie. data diversity) substantially improves generalization to OOD combinations, even with the same amount of training data. We compare learning category and viewpoint in separate and shared network architectures, and observe starkly different trends on in-distribution and OOD combinations, ie. while shared networks are helpful in-distribution, separate networks significantly outperform shared ones at OOD combinations. Finally, we demonstrate that such OOD generalization is facilitated by the neural mechanism of specialization, ie. the emergence of two types of neurons -- neurons selective to category and invariant to viewpoint, and vice versa.
翻译:对象识别和观点估计是视觉理解的核心。 最近的工程表明, 进化神经网络(CNNs)无法概括出分布( OOOD) 类别视图组合, 也就是说, 培训期间看不到的组合。 在本文中, 我们调查何时以及如何可能实现这种OOD的概括化,方法是通过对受过训练的对对象类别和 OOOD组合的3D观点进行分类的CNNs 进行评估, 并确定有助于OOOOD一般化的神经机制。 我们表明, 增加分布组合( 即数据多样性)的数量大大改进了对OOOD组合的概括化, 即使使用同等数量的培训数据。 我们在单独和共享的网络结构中比较学习类别和观点, 观察分布和 OOOD组合方面截然不同的趋势, 也就是说, 共享的网络在分布上很有帮助, 分离的网络大大超出OOD组合的组合的组合。 最后, 我们证明, 两种类型神经系统的出现 -- 选择类别和变式观点, 以及变式观点。