The human prioritization of image regions can be modeled in a time invariant fashion with saliency maps or sequentially with scanpath models. However, while both types of models have steadily improved on several benchmarks and datasets, there is still a considerable gap in predicting human gaze. Here, we leverage two recent developments to reduce this gap: theoretical analyses establishing a principled framework for predicting the next gaze target and the empirical measurement of the human cost for gaze switches independently of image content. We introduce an algorithm in the framework of sequential decision making, which converts any static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift. These maps are based on 1) a saliency map provided by an arbitrary saliency model, 2) the recently measured human cost function quantifying preferences in magnitude and direction of eye movements, and 3) a sequential exploration bonus, which changes with each subsequent gaze shift. The parameters of the spatial extent and temporal decay of this exploration bonus are estimated from human gaze data. The relative contributions of these three components were optimized on the MIT1003 dataset for the NSS score and are sufficient to significantly outperform predictions of the next gaze target on NSS and AUC scores for five state of the art saliency models on three image data sets. Thus, we provide an implementation of human gaze preferences, which can be used to improve arbitrary saliency models' predictions of humans' next gaze targets.
翻译:图像区域的人的优先排序可以在一个变化不定的时间以显著的地图为模型,或以相貌模型为序列。但是,虽然这两种模型在几个基准和数据集上稳步改进,但在预测人类视线方面仍有相当大的差距。在这里,我们利用最近的两项发展来缩小这一差距:理论分析,为预测下一个凝视目标建立一个原则框架,并对视光开关的人类成本进行实证衡量,而与图像内容无关;我们在顺序决策框架内引入一种算法,将任何静态的显要地图转换成动态的历史依赖值地图序列,这些图在每次目光变化后重新绘制。这些地图的基础是:1)由任意突出模型提供的突出地图,2)最近测量的人类成本功能,在眼运动的规模和方向上量化偏好,3)随着随后的凝视变化,按顺序计算出勘探红利的空间范围和时间变坏的参数,根据人类凝视数据数据数据估算。这三个组成部分的相对贡献,在国家统计系统下一个分数的MIT1003数据集中得到了优化,并且足以大大超越了国家统计系统下一个目标的直观。