Existing models of human visual attention are generally unable to incorporate direct task guidance and therefore cannot model an intent or goal when exploring a scene. To integrate guidance of any downstream visual task into attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To this end, we impose to neural networks the biological constraint of foveated vision and train an attention mechanism to generate visual explorations that maximize the performance with respect to the downstream task. We observe that biologically constrained neural networks generate human-like scanpaths without being trained for this objective. Extensive experiments on three common benchmark datasets show that our method outperforms state-of-the-art unsupervised human attention models in generating human-like scanpaths.
翻译:人类视觉关注的现有模型一般无法纳入直接任务指导,因此在探索场景时无法模拟意图或目标。为了将任何下游视觉任务的指导纳入关注模型,我们建议采用神经视觉关注算法。为此,我们向神经网络强加先天视觉的生物约束,并培训关注机制,以产生视觉探索,使下游任务的性能最大化。我们观察到,受到生物制约的神经网络在没有为此目标进行培训的情况下产生类似人类的扫描路径。关于三个共同基准数据集的广泛实验表明,我们的方法在产生像人类一样的扫描路径方面,优于最先进的不受监督的人类关注模型。