Error propagation is a general but crucial problem in online semi-supervised video object segmentation. We aim to suppress error propagation through a correction mechanism with high reliability. The key insight is to disentangle the correction from the conventional mask propagation process with reliable cues. We introduce two modulators, propagation and correction modulators, to separately perform channel-wise re-calibration on the target frame embeddings according to local temporal correlations and reliable references respectively. Specifically, we assemble the modulators with a cascaded propagation-correction scheme. This avoids overriding the effects of the reliable correction modulator by the propagation modulator. Although the reference frame with the ground truth label provides reliable cues, it could be very different from the target frame and introduce uncertain or incomplete correlations. We augment the reference cues by supplementing reliable feature patches to a maintained pool, thus offering more comprehensive and expressive object representations to the modulators. In addition, a reliability filter is designed to retrieve reliable patches and pass them in subsequent frames. Our model achieves state-of-the-art performance on YouTube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance. Code is available at: https://github.com/JerryX1110/RPCMVOS.
翻译:错误传播是在线半监控视频对象分割中一个普遍但至关重要的问题。 我们的目标是通过高可靠性的校正机制抑制错误传播。 关键洞察力是用可靠的提示将校正与常规遮罩传播过程分离。 我们引入两个调制器, 即传播和校正调制模器, 分别根据本地时间相关性和可靠参考分别对嵌入的目标框架进行频道式再校准, 从而向调制器提供更全面、 表达性对象的表达方式。 此外, 我们设计了一个可靠过滤器, 以回收可靠的传播校正方案, 并在以后的框中传递。 虽然使用地面真相标签的参考框架提供了可靠的提示, 但它可能与目标框架大不相同, 并引入不确定或不完整的校正关系。 我们通过将可靠的功能补齐功能补丁补丁, 向调器提供更全面和表达对象的表达方式。 此外, 我们的模型在YouTube- VOS/18/19 和 DAVIRC 上实现状态- 状态- 性能测试 提供可靠的测试 。