Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the generalization of the relations learned. In recent years, several innovations in DNNs have been developed in order to enable learning abstract relation from images. In this work, we systematically evaluate a series of DNNs that integrate mechanism such as slot attention, recurrently guided attention, and external memory, in the simplest possible visual reasoning task: deciding whether two objects are the same or different. We found that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board. We conclude that abstract visual reasoning remains largely an unresolved challenge for DNNs.
翻译:基于物体中心表示、引导注意力和外部内存的作用对视觉关系的泛化
The translated abstract
视觉推理是视觉研究的一个长期目标。过去十年中,已经有多项工作试图将深度神经网络(DNN)应用于从图像中学习视觉关系的任务,但在学习关系的泛化方面取得了有限的成果。近年来,为了使DNN能够从图像中学习抽象关系,已经开发了几种创新机制,例如槽式注意、循环引导注意力和外部内存。在此工作中,我们系统地评估了一系列DNN,这些网络将这些机制整合到最简单的视觉推理任务中:判断两个对象是否相同或不同。我们发现,虽然有些模型在将相同-不同关系推广到特定类型的图像方面表现得比其他模型好,但没有任何一种模型能够在各个方面泛化这种关系。我们得出结论:DNN仍然很难解决抽象视觉推理的挑战。