Human reasoning is grounded in an ability to identify highly abstract commonalities governing superficially dissimilar visual inputs. Recent efforts to develop algorithms with this capacity have largely focused on approaches that require extensive direct training on visual reasoning tasks, and yield limited generalization to problems with novel content. In contrast, a long tradition of research in cognitive science has focused on elucidating the computational principles underlying human analogical reasoning; however, this work has generally relied on manually constructed representations. Here we present visiPAM (visual Probabilistic Analogical Mapping), a model of visual reasoning that synthesizes these two approaches. VisiPAM employs learned representations derived directly from naturalistic visual inputs, coupled with a similarity-based mapping operation derived from cognitive theories of human reasoning. We show that without any direct training, visiPAM outperforms a state-of-the-art deep learning model on an analogical mapping task. In addition, visiPAM closely matches the pattern of human performance on a novel task involving mapping of 3D objects across disparate categories.
翻译:人类推理的基础是能够确定关于表面不同视觉投入的高度抽象的共性。最近开发具有这种能力的算法的努力主要侧重于需要广泛直接培训的视觉推理任务方法,对新内容的问题进行有限的概括化。相比之下,长期的认知科学研究传统侧重于阐明人类模拟推理所依据的计算原则;然而,这项工作一般依赖人工构建的演示。我们在这里展示了视觉推理模型(视觉概率性模拟绘图),这是综合这两种方法的视觉推理模型。VisiPAM采用直接来自自然学视觉投入的学习表现,加上根据人类推理的认知理论进行的类似性绘图操作。我们表明,未经任何直接培训,ViPAM在模拟绘图任务上超越了一种最先进的学习模型。此外,ViPAM与人类在涉及绘制不同类别3D物体的新任务方面的表现模式非常接近。