Current state-of-the-art approaches for Semi-supervised Video Object Segmentation (Semi-VOS) propagates information from previous frames to generate segmentation mask for the current frame. This results in high-quality segmentation across challenging scenarios such as changes in appearance and occlusion. But it also leads to unnecessary computations for stationary or slow-moving objects where the change across frames is minimal. In this work, we exploit this observation by using temporal information to quickly identify frames with minimal change and skip the heavyweight mask generation step. To realize this efficiency, we propose a novel dynamic network that estimates change across frames and decides which path -- computing a full network or reusing previous frame's feature -- to choose depending on the expected similarity. Experimental results show that our approach significantly improves inference speed without much accuracy degradation on challenging Semi-VOS datasets -- DAVIS 16, DAVIS 17, and YouTube-VOS. Furthermore, our approach can be applied to multiple Semi-VOS methods demonstrating its generality. The code is available in https://github.com/HYOJINPARK/Reuse_VOS.
翻译:目前对半监督的视频对象分割(Semi-VOS) 采用的最新方法传播了以往框架的信息,以生成当前框架的分隔面罩。 结果是在具有挑战性的情景( 如外观和封闭面的变化)中进行高质量的分割。 但它也导致在跨框架变化最小的情况下对固定或缓慢移动的天体进行不必要的计算。 在这项工作中, 我们利用这一观察方法, 利用时间信息快速识别框架, 进行最小变化, 跳过重力面罩生成步骤。 为了实现这一效率, 我们提议建立一个新的动态网络, 估计跨框架的变化, 并决定哪条路径( 计算完整网络或使用先前框架的特征), 取决于预期的相似性。 实验结果显示, 我们的方法大大改进了半VOS数据集( DAVIS 16, DAVIS 17, YouTube-VOS 17) 的推断速度, 但没有多少精确度下降。 此外, 我们的方法可以适用于多个 Semi- VOS 方法, 展示其普遍性。 代码可在 https://github.com/ HYJINPARK/REuse_VESOVOS_VOVOVO.