Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the Synthetic Visual Reasoning Test (SVRT) challenge, a collection of twenty-three visual reasoning problems. Our analysis leads to a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in human's visual reasoning ability. To test this, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of experiments, we evaluated the ability of these attention networks to learn to solve the SVRT challenge and found the resulting architectures to be much more efficient at solving the hardest of these visual reasoning tasks. Most importantly, the corresponding improvements on individual tasks partially explained the taxonomy. Overall, this work advances our understanding of visual reasoning and yields testable Neuroscience predictions regarding the need for feature-based vs. spatial attention in visual reasoning.
翻译:视觉理解要求理解一个场景中天体之间复杂的视觉关系。 在这里, 我们试图描述抽象视觉推理的计算要求。 我们这样做的方法是系统地评估现代深层神经神经神经网络(CNNs)学习解决合成视觉理性测试(SVRT)挑战的能力。 共收集了23个视觉推理问题。 我们的分析导致视觉推理任务的新分类, 这主要可以用关系类型( 相同差异相对于空间关系判断) 和用于构建基本规则的关系数量来解释。 先前的认知神经科学研究表明, 关注在人类视觉推理能力中发挥着关键作用。 为了测试这一点, 我们用空间和基于特征的注意机制扩大了CNNs。 在第二系列实验中, 我们评估了这些关注网络学习解决SVRT挑战的能力, 发现由此产生的结构在解决这些视觉推理任务中最困难方面的效率要高得多。 最重要的是, 个人任务的相应改进部分解释了关于税收的需要。 总体而言, 这项工作增进了我们对视觉推理学和可测试性神经学预测中的视觉推理学的理解。