For a long time the ability to solve abstract reasoning tasks was considered one of the hallmarks of human intelligence. Recent advances in application of deep learning (DL) methods led, as in many other domains, to surpassing human abstract reasoning performance, specifically in the most popular type of such problems - the Raven's Progressive Matrices (RPMs). While the efficacy of DL systems is indeed impressive, the way they approach the RPMs is very different from that of humans. State-of-the-art systems solving RPMs rely on massive pattern-based training and sometimes on exploiting biases in the dataset, whereas humans concentrate on identification of the rules / concepts underlying the RPM (or generally a visual reasoning task) to be solved. Motivated by this cognitive difference, this work aims at combining DL with human way of solving RPMs and getting the best of both worlds. Specifically, we cast the problem of solving RPMs into multi-label classification framework where each RPM is viewed as a multi-label data point, with labels determined by the set of abstract rules underlying the RPM. For efficient training of the system we introduce a generalisation of the Noise Contrastive Estimation algorithm to the case of multi-label samples. Furthermore, we propose a new sparse rule encoding scheme for RPMs which, besides the new training algorithm, is the key factor contributing to the state-of-the-art performance. The proposed approach is evaluated on two most popular benchmark datasets (Balanced-RAVEN and PGM) and on both of them demonstrates an advantage over the current state-of-the-art results. Contrary to applications of contrastive learning methods reported in other domains, the state-of-the-art performance reported in the paper is achieved with no need for large batch sizes or strong data augmentation.
翻译:在很长一段时间里,解决抽象推理任务的能力被认为是人类情报的标志之一。最近应用深层次学习(DL)方法的进展,正如在许多其他领域一样,导致超越人类抽象推理性能,特别是在最受欢迎的这类问题类型 — — 乌鸦的进步矩阵(RPM ) 。虽然DL系统的效率确实令人印象深刻,但它们与RPM的处理方式与人类非常不同。解决RPM的最新系统依赖于大规模基于模式的培训,有时还利用数据集中的偏差,而人类则专注于确定RPM(或一般是一个视觉推理任务)所要解决的规则/概念。由于这种认知差异的驱动,这项工作的目的是将DL与解决RPM的人类方法相结合。具体地说,我们把解决RPMM方法的问题放到多标签分类框架中,每个RPM方法都被视为一个多层次的数据基点,由RPM背后的一组抽象规则所决定的标签,而对于当前RPMM(或一般直观推论)应用规则的原理/概念应用规则,我们将一个高层次的新的标准化方法用于更新的系统。