Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM),critically depend on the ability of the computational approach to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules based on those concepts (i.e., the reasoning). However, learning of representation and reasoning is a challenging and ill-posed task,often approached in a stage-wise manner (first representation, then reasoning). In this work, we propose an end-to-end joint representation-reasoning learning framework, which leverages a weak form of inductive bias to improve both tasks together. Specifically, we propose a general generative graphical model for RPMs, GM-RPM, and apply it to solve the reasoning test. We accomplish this using a novel learning framework Disentangling based Abstract Reasoning Network (DAReN) based on the principles of GM-RPM. We perform an empirical evaluation of DAReN over several benchmark datasets. DAReN shows consistent improvement over state-of-the-art (SOTA) models on both the reasoning and the disentanglement tasks. This demonstrates the strong correlation between disentangled latent representation and the ability to solve abstract visual reasoning tasks.
翻译:解决视觉推理测试的计算学习方法,如雷文的累进矩阵(RPM),关键取决于计算方法确定测试中使用的视觉概念(即代表)以及基于这些概念的潜在规则(即推理)的能力。然而,代表性和推理的学习是一项具有挑战性和弊端的任务,往往以分阶段的方式(第一次代表,然后推理)处理。在这项工作中,我们提议了一个端到端的联合代表-引理学习框架,它利用一种微弱的感知偏差形式来共同改进这两项任务。具体地说,我们为RPM(G-RPM)提出一个通用的基因化图形模型模型,并将其用于解决推理测试。我们利用一个新的学习框架来完成这一任务,即基于GM-RPM原则的脱钩简要推理网络(DARN)。我们对DARN在若干基准数据集方面进行经验性评估。DARN显示,在状态-艺术模型(SOTA)模型上都有一个微弱的进偏向偏向偏向性对比,在推理学和视觉代表之间显示出这种强烈的分化能力上的分解任务。