Video salient object detection (VSOD) aims to locate and segment the most attractive object by exploiting both spatial cues and temporal cues hidden in video sequences. However, spatial and temporal cues are often unreliable in real-world scenarios, such as low-contrast foreground, fast motion, and multiple moving objects. To address these problems, we propose a new framework to adaptively capture available information from spatial and temporal cues, which contains Confidence-guided Adaptive Gate (CAG) modules and Dual Differential Enhancement (DDE) modules. For both RGB features and optical flow features, CAG estimates confidence scores supervised by the IoU between predictions and the ground truths to re-calibrate the information with a gate mechanism. DDE captures the differential feature representation to enrich the spatial and temporal information and generate the fused features. Experimental results on four widely used datasets demonstrate the effectiveness of the proposed method against thirteen state-of-the-art methods.
翻译:视频突出物体探测(VSOD)的目的是通过利用视频序列中隐藏的空间提示和时间提示来定位和分割最有吸引力的物体,然而,在现实世界情景中,如低相距前景、快速运动和多个移动物体,空间和时间提示(VSOD)往往不可靠。为解决这些问题,我们提议一个新的框架,以适应方式获取从空间和时间提示(包含信任引导调适门模块)和双重差异增强(DDE)模块获得的信息。对于RGB特征和光学流特征,CAG估计IoU监督的预测和地面真相之间的信任度分数,以便用门机制对信息进行重新校准。DDE捕捉了不同特征的特征,以丰富空间和时间信息并生成连接特征。四个广泛使用的数据集的实验结果表明,针对13种先进方法,拟议方法的有效性。