Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency. This technique helps us understand the learning pattern of the deep network at its intermediate layers through their activation maps. Initially, we consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We use a set of biologically plausible log-gabor filters for identifying and reconstructing the activation maps of them using our explainable saliency model. The final saliency map is generated using these reconstructed activation maps. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction. Then, we evaluate and compare the performance of the explainable models derived from UNISAL, MSI-Net and CMRNet on three benchmark datasets with other state-of-the-art methods. Hence, we propose that this approach of explainability can be applied to any deep visual saliency model for interpretation which makes it a generic one.
翻译:深心神经网络在视觉显眼性预测中展示了它们对于实现人类水平表现的深刻影响。然而,目前还不清楚它们是如何学习任务和理解人类视觉系统意味着什么的。在这项工作中,我们开发了一种技术,通过应用人类认知理论和显著性的传统概念,从相应的深心神经结构的突出性模型中得出可以解释的显著模型。这一技术有助于我们了解深心网络中间层的学习模式,通过启动地图。最初,我们考虑两种最先进的深度模型,即UNISAL和MISI-Net,用于我们的解释。我们使用一套生物学上看似合理的log-gabor过滤器,利用我们可解释的显著性模型来确定和重建这些模型的启动性地图。最后的突出性图是利用这些经过重建的激活性地图产生的。我们还建立了我们自己的深深显性模型,名为交叉连接的多尺度的基于网络(CMRNet),用于显著性预测。然后,我们评估并比较了从UNISAL、MS-Net和CMRNet中得出的可解释模型的性模型的绩效。我们用三种基准性通用方法来解释。我们用来解释一个基准性模型,可以用来解释。