In this paper, we study the problem of geometric reasoning in the context of question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture designed for answering questions that admit latent visual representations. DSMN learns to generate and reason over such representations. Further, we propose two synthetic benchmarks, FloorPlanQA and ShapeIntersection, to evaluate the geometric reasoning capability of QA systems. Experimental results validate the effectiveness of our proposed DSMN for visual thinking tasks.
翻译:在本文中,我们从问答的角度研究几何推理问题,我们引入了动态空间记忆网(DSMN),这是一个新的深层网络结构,旨在回答接受潜在视觉表现的问题,DSMN学会了生成和解释这些表现,此外,我们提出了两个合成基准,即Plop PlanQA和Shape Intersection,以评价QA系统的几何推理能力,实验结果证实了我们提议的DSMN在视觉思维任务方面的有效性。