As moving objects always draw more attention of human eyes, the temporal motive information is always exploited complementarily with spatial information to detect salient objects in videos. Although efficient tools such as optical flow have been proposed to extract temporal motive information, it often encounters difficulties when used for saliency detection due to the movement of camera or the partial movement of salient objects. In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information. We construct a symmetric two-bypass network to explicitly extract spatial and temporal features. A dynamic weight generator (DWG) is designed to automatically learn the reliability of corresponding saliency branch. And a top-down cross attentive aggregation (CAA) procedure is designed so as to facilitate dynamic complementary aggregation of spatiotemporal features. Finally, the features are modified by spatial attention with the guidance of coarse saliency map and then go through decoder part for final saliency map. Experimental results on five benchmarks VOS, DAVIS, FBMS, SegTrack-v2, and ViSal demonstrate that the proposed method achieves superior performance than state-of-the-art algorithms. The source code is available at https://github.com/TJUMMG/DS-Net.
翻译:由于移动物体总是引起人类眼睛的更多注意,时间动机信息总是以空间信息加以补充,以探测视频中的突出物体。虽然提出了光学流等有效工具以提取时间动机信息,但由于相机移动或突出物体部分移动,在使用显要性检测时往往遇到困难。在本文件中,我们调查空间和时间信息的辅助作用,并提出一个新的动态时空网络(DS-Net),以便更有效地整合波地球信息。我们建立了一个对称双对流网络,以明确提取空间和时间特征。一个动态重力网络生成器(DWG)旨在自动学习相应的显要性分支的可靠性。一个自上而下的交叉关注聚合程序(CAAA)的设计是为了便利对波地光特性进行动态互补组合。最后,通过对地心图的引导,对地貌进行空间关注,然后对最后突出地图的解码部分进行修改。在VOS、DAVIS、FBMS、SEGTRack-VARK-VA2、SEGRG-VAS-RADGS 和MAS-VSALADS SALAUDS