Visual scanpath is the sequence of fixation points that the human gaze travels while observing an image, and its prediction helps in modeling the visual attention of an image. To this end several models were proposed in the literature using complex deep learning architectures and frameworks. Here, we explore the efficiency of using common deep learning architectures, in a simple fully convolutional regressive manner. We experiment how well these models can predict the scanpaths on 2 datasets. We compare with other models using different metrics and show competitive results that sometimes surpass previous complex architectures. We also compare the different leveraged backbone architectures based on their performances on the experiment to deduce which ones are the most suitable for the task.
翻译:视觉扫描路透是人类凝视在观察图像时移动的固定点序列,它的预测有助于模拟图像的视觉关注度。 为此,文献中使用复杂的深层学习架构和框架提出了几种模型。 在这里, 我们探索使用共同深层学习架构的效率, 简单的完全进化后退方式。 我们实验这些模型在2个数据集上预测扫描路径的能力。 我们使用不同的尺度与其他模型进行比较, 并显示有时超过以往复杂架构的竞争性结果。 我们还根据实验的性能, 比较不同的杠杆骨干架构, 以推断哪些最适合这项任务。