Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task, often approached in a stage-wise manner (first representation, then reasoning). In this work, we propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. Specifically, we introduce a general generative graphical model for RPMs, GM-RPM, and apply it to solve the reasoning test. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM. We perform an empirical evaluation of DAReN over several benchmark datasets. DAReN shows consistent improvement over state-of-the-art (SOTA) models on both the reasoning and the disentanglement tasks. This demonstrates the strong correlation between disentangled latent representation and the ability to solve abstract visual reasoning tasks.
翻译:解决视觉推理测试的计算学习方法,如雷文的累进矩阵(RPM),关键取决于确定测试中使用的视觉概念(即代表)以及基于这些概念的潜在规则(即推理)的能力。然而,代表性和推理的学习是一项具有挑战性和弊端的任务,通常以分阶段的方式(第一次代表,然后推理)处理。在这项工作中,我们提议了一个端到端的联合代表-推理学习框架,利用一种微弱的诱导偏差形式来共同改进两个任务。具体地说,我们为RPM(GM-RPM)引入一个通用的基因化图形模型模型,并将其用于解决推理测试。我们利用基于GM-RPM原则的新颖学习框架来完成这一任务。我们对DARN(DAREN)在几个基准数据集上进行了实证性评估。DARN显示,在推理学和分解能力方面,在状态-艺术(SOTA)模型上都不断改进。这显示了与状态-art(SOTA)模型在推理学和视觉推理能力上都具有很强的对比。