The last years have seen a surge in models predicting the scanpaths of fixations made by humans when viewing images. However, the field is lacking a principled comparison of those models with respect to their predictive power. In the past, models have usually been evaluated based on comparing human scanpaths to scanpaths generated from the model. Here, instead we evaluate models based on how well they predict each fixation in a scanpath given the previous scanpath history. This makes model evaluation closely aligned with the biological processes thought to underly scanpath generation and allows to apply established saliency metrics like AUC and NSS in an intuitive and interpretable way. We evaluate many existing models of scanpath prediction on the datasets MIT1003, MIT300, CAT2000 train and CAT200 test, for the first time giving a detailed picture of the current state of the art of human scanpath prediction. We also show that the discussed method of model benchmarking allows for more detailed analyses leading to interesting insights about where and when models fail to predict human behaviour. The MIT/Tuebingen Saliency Benchmark will implement the evaluation of scanpath models as detailed here, allowing researchers to score their models on the established benchmark datasets MIT300 and CAT2000.
翻译:在过去的几年中,人们看到预测人类在观看图像时定型的扫描路径的模型急剧增加;然而,实地缺乏对这些模型的预测力进行有原则的比较;过去,通常根据将人类扫描路径与模型产生的扫描路径进行比较,对模型进行评估。在这里,我们根据以往的扫描路径历史,根据对扫描路径中的每个定型的预测有多好,对模型进行评估,使模型评价与被认为未得到充分扫描路径生成的生物过程密切配合,并允许以直观和可解释的方式应用诸如AUC和NSS等既定的显著指标。我们评估了许多现有的关于数据集MIT1003、MIT300、CAT2000列车和CAT200试验的扫描预测模型模型,这是第一次详细描述人类扫描路径预测的艺术现状。我们还表明,讨论的模型基准方法可以进行更详细的分析,从而对模型在哪些地方和何时不能预测人类行为进行有趣的洞察。MIT/Tuebing Salent 基准将进行扫描定位300模型评估,作为详细的数据,使研究人员能够在这里对2000年的CAT基准进行评分。