TTVOS: 轻量视频对象分离,具有适应性模模模关注模块和时间一致性损失 (TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss)

Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects from a given target mask. For doing this, various approaches have been developed based on online-learning, memory networks, and optical flow. These methods show high accuracy but are hard to be utilized in real-world applications due to slow inference time and tremendous complexity. To resolve this problem, template matching methods are devised for fast processing speed but sacrificing lots of performance in previous models. We introduce a novel semi-VOS model based on a template matching method and a temporal consistency loss to reduce the performance gap from heavy models while expediting inference time a lot. Our template matching method consists of short-term and long-term matching. The short-term matching enhances target object localization, while long-term matching improves fine details and handles object shape-changing through the newly proposed adaptive template attention module. However, the long-term matching causes error-propagation due to the inflow of the past estimated results when updating the template. To mitigate this problem, we also propose a temporal consistency loss for better temporal coherence between neighboring frames by adopting the concept of a transition matrix. Our model obtains 79.5% J&F score at the speed of 73.8 FPS on the DAVIS16 benchmark.

翻译：半监督视频对象分割(semi- VOS) 在许多应用程序中广泛使用。这项任务是跟踪特定目标掩码中的等级不可知对象。为此,在网上学习、记忆网络和光学流的基础上开发了多种方法。这些方法显示高度精度, 但很难用于真实世界应用程序, 原因是推论时间缓慢且复杂程度巨大。要解决这个问题, 模板匹配方法是针对快速处理速度设计的, 但却牺牲了以往模型中的许多性能。我们采用了基于模板匹配方法和时间一致性损失的新半VOS模型, 以减少重模型的性能差距, 同时大量加快推断时间。我们的模板匹配方法包括短期和长期匹配。短期匹配可以增强目标本地化, 而长期匹配可以改善细节, 并通过新提议的适应模板关注模块来处理对象形状变化。但是, 长期匹配导致错误和解, 原因是在更新模板时输入了过去估计结果。为了减轻这一问题, 我们还建议采用一个时间一致性损失模型, 以便提高基准J- VI 的Y- 7- 5 BUR 之间的时间性差分比基准, 采用DA- AS 格式的升级速度框架。