Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omnidirectional images (ODIs) rely on deeper or broader convolutional neural networks (CNNs), which benefit from CNNs' superior feature representation capabilities while suffering from their high computational costs. In this paper, inspired by the human visual cognitive process, i.e., human being's perception of a visual scene is always accomplished by multiple stages of analysis, we propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360, to predict the saliency maps stage by stage. At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map. We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map. In addition, we share the weights among all the stages to obtain a lightweight architecture that is computationally cheap. Extensive experiments are conducted to demonstrate that our proposed model outperforms the state-of-the-art model in terms of both prediction accuracy and model size.
翻译:与平面 2D 图像相比,360 度图像的显著预测由于其分辨率高和球形观测范围广而更具有挑战性。目前,全向图像的多数高性能显著预测模型依赖于更深或更广的共生神经网络,这些网络得益于CNN的优异特征表现能力,同时受到高计算成本的影响。在本文中,在人类视觉认知过程的启发下,人类对视觉场景的感知总是通过多个分析阶段完成。我们提议为ODIs dubbed MRGAN360 建立一个新的多阶段重复基因对抗网络,以便按阶段预测突出的地图阶段。在每个阶段,预测模型将前阶段的原始图像和模型产出和产出作为更准确的突出度图。我们利用一个经常性的神经网络,在临近的预测阶段中,即人类对视觉场景的感知过程总是通过多个分析阶段完成。我们提议为ODIs dubed MRGAN360 提供一个新的多阶段重复的基因对抗网络,在每一个阶段里程的模型中,在每一个阶段都利用一个测试阶段的精度中,我们所选的精度的模型的精度 。</s>