Recently, memory-based approaches show promising results on semi-supervised video object segmentation. These methods predict object masks frame-by-frame with the help of frequently updated memory of the previous mask. Different from this per-frame inference, we investigate an alternative perspective by treating video object segmentation as clip-wise mask propagation. In this per-clip inference scheme, we update the memory with an interval and simultaneously process a set of consecutive frames (i.e. clip) between the memory updates. The scheme provides two potential benefits: accuracy gain by clip-level optimization and efficiency gain by parallel computation of multiple frames. To this end, we propose a new method tailored for the per-clip inference. Specifically, we first introduce a clip-wise operation to refine the features based on intra-clip correlation. In addition, we employ a progressive matching mechanism for efficient information-passing within a clip. With the synergy of two modules and a newly proposed per-clip based training, our network achieves state-of-the-art performance on Youtube-VOS 2018/2019 val (84.6% and 84.6%) and DAVIS 2016/2017 val (91.9% and 86.1%). Furthermore, our model shows a great speed-accuracy trade-off with varying memory update intervals, which leads to huge flexibility.
翻译:最近, 以记忆为基础的方法显示半监督视频对象分割法的有希望结果。 这些方法通过经常更新前掩码的记忆来预测对象掩码框架的每个框架。 不同于这一框架的推断, 我们通过将视频对象分割法作为剪贴式遮罩传播来调查另一种观点。 在这种剪贴式推断方案中, 我们用一个间隔来更新记忆, 同时在记忆更新之间处理一套连续框架( 剪贴片) 。 这个方法提供了两种潜在的好处: 通过剪贴层优化和通过平行计算多个框架提高效率来提高精确度。 为此, 我们建议了一种适合每切切口误切换功能的新方法。 具体地说, 我们首先采用剪贴切操作来根据剪贴面的掩码传播。 此外, 我们用一个渐进式匹配机制来在剪贴片中高效地传递信息。 由于两个模块的协同效应和新提议的每切贴图培训, 我们的网络在Youtube- VOS 2018/2019 val (84.6% 和84. 84.6% 和84.6% ) 和 DAVI- saleval- ad adal adal- bal- ladeal- bille, laxe) 和 DAB- bal- billeal- bal- bal- bal- bal- bal- breal- bal- bal- bal- bal- bal- bal- bal- sal- adal- bal- bal- balxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxl) 和D- 和Dal_。