Supervised deep convolutional neural networks (DCNNs) are currently one of the best computational models that can explain how the primate ventral visual stream solves object recognition. However, embodied cognition has not been considered in the existing visual processing models. From the ecological standpoint, humans learn to recognize objects by interacting with them, allowing better classification, specialization, and generalization. Here, we ask if computational models under the embodied learning framework can explain mechanisms underlying object recognition in the primate visual system better than the existing supervised models? To address this question, we use reinforcement learning to train neural network models to play a 3D computer game and we find that these reinforcement learning models achieve neural response prediction accuracy scores in the early visual areas (e.g., V1 and V2) in the levels that are comparable to those accomplished by the supervised neural network model. In contrast, the supervised neural network models yield better neural response predictions in the higher visual areas, compared to the reinforcement learning models. Our preliminary results suggest the future direction of visual neuroscience in which deep reinforcement learning should be included to fill the missing embodiment concept.
翻译:受监督的深层神经神经网络(DCNNS)目前是最佳计算模型之一,可以解释灵长脑视觉流如何解决对象识别问题。 然而,现有视觉处理模型中并未考虑到体现的认知性。 从生态角度看,人类通过与它们互动而学会识别物体,从而可以更好地分类、专业化和概括化。在这里,我们问,在包含的学习框架内的计算模型是否能比现有受监督模型更好地解释灵长视系统中物体识别机制?为了解决这一问题,我们利用强化学习来培训神经网络模型来玩3D电脑游戏,我们发现这些强化学习模型在早期视觉区域(例如V1和V2)达到神经反应预测准确度的分数,与受监督神经网络模型所完成的分级相近。相比之下,受监督的神经网络模型与强化学习模型相比,在更高视觉区域产生更好的神经反应预测。我们的初步结果显示视觉神经科学的未来方向,其中应包括深度强化学习以填补缺失的化学概念。