Video salient object detection models trained on pixel-wise dense annotation have achieved excellent performance, yet obtaining pixel-by-pixel annotated datasets is laborious. Several works attempt to use scribble annotations to mitigate this problem, but point supervision as a more labor-saving annotation method (even the most labor-saving method among manual annotation methods for dense prediction), has not been explored. In this paper, we propose a strong baseline model based on point supervision. To infer saliency maps with temporal information, we mine inter-frame complementary information from short-term and long-term perspectives, respectively. Specifically, we propose a hybrid token attention module, which mixes optical flow and image information from orthogonal directions, adaptively highlighting critical optical flow information (channel dimension) and critical token information (spatial dimension). To exploit long-term cues, we develop the Long-term Cross-Frame Attention module (LCFA), which assists the current frame in inferring salient objects based on multi-frame tokens. Furthermore, we label two point-supervised datasets, P-DAVIS and P-DAVSOD, by relabeling the DAVIS and the DAVSOD dataset. Experiments on the six benchmark datasets illustrate our method outperforms the previous state-of-the-art weakly supervised methods and even is comparable with some fully supervised approaches. Source code and datasets are available.
翻译:在像素密度高的批注中,经过培训的显要目标探测模型取得了杰出的性能,但获得像素逐像素逐项附加说明的数据集却令人费解。一些工作试图使用书写说明来缓解这一问题,但并未探索将监督作为节省劳力的批注方法(即使是人工批注方法中最节省劳力的方法,用于密集预测的人工批注方法),本文中,我们提出了一个基于点监管的强有力的基准模型。为了用时间信息推断突出的图示,我们分别从短期和长期角度收集了跨框架补充信息。具体地说,我们提议了一个混合象征性关注模块,该模块将光学流和图像信息从正方位方向混合起来,但将关键光流信息(通道尺寸)和关键符号信息(空间尺寸)作为更加节省的批注方法。为了利用长期提示,我们开发了长期跨森林注意模块(LCFA),该模块以多框架符号为基础协助当前框架推导的源码对象。此外,我们将双点监控数据流和图像显示的光学流流流流和图像模块中的数据流、PDAVIS 和DAVDS 全面显示前的可比较的DAD-D-DDD-DDDG-S 和DAD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Silvivivivivivial-S-S-S-S-S-S-S-S-S-S-S-D-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-DAR-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S