Understanding and predicting the human visual attentional mechanism is an active area of research in the fields of neuroscience and computer vision. In this work, we propose DeepFix, a first-of-its-kind fully convolutional neural network for accurate saliency prediction. Unlike classical works which characterize the saliency map using various hand-crafted features, our model automatically learns features in a hierarchical fashion and predicts saliency map in an end-to-end manner. DeepFix is designed to capture semantics at multiple scales while taking global context into account using network layers with very large receptive fields. Generally, fully convolutional nets are spatially invariant which prevents them from modeling location dependent patterns (e.g. centre-bias). Our network overcomes this limitation by incorporating a novel Location Biased Convolutional layer. We evaluate our model on two challenging eye fixation datasets -- MIT300, CAT2000 and show that it outperforms other recent approaches by a significant margin.
翻译:了解和预测人类视觉注意力机制是神经科学和计算机视觉领域一个积极的研究领域。 在这项工作中,我们提议DhiepFix, 这是一种同类的首创全演神经神经网络,用于准确的显著预测。与使用各种手工艺特征的突出地图特征的古典作品不同,我们的模型自动以分级方式学习特征,并以端到端的方式预测突出的地图。DhiepFix旨在利用具有非常大可接收域的网络层,从多个尺度上捕捉语义学,同时考虑到全球背景。一般而言,全演网是空间变异性,防止它们建模地点依赖模式(如中心-方向)。我们的网络克服了这一限制,采用了一种新颖的位置,即双演层层。我们评估了我们关于两个挑战性眼固定数据集的模型 -- -- MIT300、CAT2000,并显示它大大超越了最近采用的其他方法。