Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues. In this work, we study a novel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme between motion and appearance in exploiting the cross-modal features from the fusion and decoding stage. Specifically, we introduce the relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model's robustness and update the inconsistent features from the spatial-temporal embeddings, we adopt the bidirectional purification module (BPM) after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur, occlusion) and achieves favourable performance against existing cutting-edges both in the video object segmentation and video salient object detection tasks. The project is publicly available at: https://dpfan.net/FSNet.
翻译:先前的视频目标分割方法主要侧重于在外观和运动之间使用简单化解决方案,限制这两个提示之间和跨两个提示之间的特征合作效率。我们研究了一个创新而高效的全双式战略网络(FSNet),以解决这一问题,方法是考虑在利用融合和解码阶段的跨模式特征时,在运动和外观之间采取更好的相互克制办法。具体地说,我们引入了相关交叉关注模块(RCAM),以便在嵌入的子空间之间传播双向信息。为了改进模型的稳健性并更新空间时空嵌入中不一致的特征,我们在RCAM之后采用了双向净化模块(BPM)。关于五大流行基准的广泛实验表明,我们的FSNet对各种挑战性情景(例如,运动模糊、封闭性)具有很强性,并针对视频对象分割和视频突出对象探测任务中的现有尖端技术取得了有利的业绩。该项目公布在 https://dpfan.net/FSNet上。