The robust and efficient recognition of visual relations in images is a hallmark of biological vision. We argue that, despite recent progress in visual recognition, modern machine vision algorithms are severely limited in their ability to learn visual relations. Through controlled experiments, we demonstrate that visual-relation problems strain convolutional neural networks (CNNs). The networks eventually break altogether when rote memorization becomes impossible, as when intra-class variability exceeds network capacity. Motivated by the comparable success of biological vision, we argue that feedback mechanisms including attention and perceptual grouping may be the key computational components underlying abstract visual reasoning.\
翻译:对图像中视觉关系的有力和有效认识是生物视觉特征的标志。我们争辩说,尽管最近在视觉认知方面取得了进展,但现代机器视觉算法严重限制了它们学习视觉关系的能力。我们通过受控实验表明,视觉关系问题使进化神经网络(CNNs)紧张。当腐烂记忆化变得不可能时,网络最终会完全崩溃,就像当阶级内部变异超过网络能力时。受生物视觉可比成功驱动,我们认为包括关注和感知组合在内的反馈机制可能是抽象视觉推理的关键计算组成部分。