以人类眼睛运动为基础的深革命神经网络中的指导视觉关注 (Guiding Visual Attention in Deep Convolutional Neural Networks Based on Human Eye Movements)

Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision, have evolved into best current computational models of object recognition, and consequently indicate strong architectural and functional parallelism with the ventral visual pathway throughout comparisons with neuroimaging and neural time series data. As recent advances in deep learning seem to decrease this similarity, computational neuroscience is challenged to reverse-engineer the biological plausibility to obtain useful models. While previous studies have shown that biologically inspired architectures are able to amplify the human-likeness of the models, in this study, we investigate a purely data-driven approach. We use human eye tracking data to directly modify training examples and thereby guide the models' visual attention during object recognition in natural images either towards or away from the focus of human fixations. We compare and validate different manipulation types (i.e., standard, human-like, and non-human-like attention) through GradCAM saliency maps against human participant eye tracking data. Our results demonstrate that the proposed guided focus manipulation works as intended in the negative direction and non-human-like models focus on significantly dissimilar image parts compared to humans. The observed effects were highly category-specific, enhanced by animacy and face presence, developed only after feedforward processing was completed, and indicated a strong influence on face detection. With this approach, however, no significantly increased human-likeness was found. Possible applications of overt visual attention in DCNNs and further implications for theories of face detection are discussed.

翻译：深革命神经网络(DCNNS)最初受生物视觉原则的启发,已经演变为目前最佳的物体识别计算模型,因此,在与神经成像和神经时间序列数据进行比较的过程中,显示与神经成形和神经时间序列数据之间的神经视觉视路径在结构上和功能上具有很强的平行性。随着最近深层次学习的进展似乎减少了这种相似性,计算神经科学受到挑战,反向工程生物光学获得有用的模型。虽然以前的研究表明,生物启发型结构能够扩大模型的人类相似性,我们在本研究中调查纯粹以数据为驱动的方法。我们利用人类眼睛跟踪数据直接修改培训范例,从而指导模型在自然图像识别过程中对视觉的注意。我们通过GradCAM针对人类参与者眼睛跟踪数据的突出的地图比较和验证不同的操作类型(即标准、人型和非人型)以获得有用的关注。我们的结果显示,在负面方向和非人型模型中拟议的定向操纵工作是直接改变的,在视觉分析后,在深度的图像处理方面,在深度分析中发现,在深度分析后,在深度分析方面,在深度分析中发现,在深度分析后,在深度分析中发现,在深度分析后,在深度分析方面,在深度分析方面,在深度分析中发现,在深度分析后,在深度分析方面,在深度分析方面,在深度分析后,在深度分析方面,在深度分析结果方面,在深度分析方面,在深度方面,在深度方面,在深度方面,在深度方面,只是发现,在深度方面,对面面面面面面面面面面面面面部中,在深度影响方面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对面,对