We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a discriminative learning loss formulation that takes into account both object and background information to ensure object-background discriminability, rather than using only object appearance. The discriminative learning loss comprises cutout-based reconstruction (cutout region represents part of a frame, whose pixels are replaced with some constant values) and tag prediction loss terms. The cutout-based reconstruction term utilizes a simple cutout scheme to learn the pixel-wise correspondence between the current and previous frames in order to reconstruct the original current frame with added cutout region in it. The introduced cutout patch guides the model to focus as much on the significant features of the object of interest as the less significant ones, thereby implicitly equipping the model to address occlusion-based scenarios. Next, the tag prediction term encourages object-background separability by grouping tags of all pixels in the cutout region that are similar, while separating them from the tags of the rest of the reconstructed frame pixels. Additionally, we introduce a zoom-in scheme that addresses the problem of small object segmentation by capturing fine structural information at multiple scales. Our proposed approach, termed CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS. A detailed ablation showcases the importance of the proposed loss formulation to effectively capture object-background discriminability and the impact of our zoom-in scheme to accurately segment small-sized objects.
翻译:我们提出一种新的自我监督的视频对象分割法(VOS), 以努力为准确的物体分割而实现更好的对象-后地差异性。 与以前自我监督的 VOS 方法不同, 我们的方法基于一种歧视性学习损失配方, 既考虑到对象和背景信息, 也考虑到对象- 地面差异性, 而不是仅仅使用对象外观。 歧视性学习损失包括基于切除的重建( 切除区域是框架的一部分, 其像素被某些不变值取代) 和标记预测损失条件。 切除-20 的重建术语利用一个简单的切除方案, 学习当前和前框架之间的等离线性对应, 以便用添加的切除区域来重建当前框架的原始框架。 引入的剪切除补制指导模型, 将目标的重要特征作为较不重要的抓取目标, 从而隐含地将模型配置用于解决基于隐蔽的假设情景。 下一步, 标签预测术语鼓励对象- 背地隔断点, 通过在切区域的所有对象- 的精确的 目标基准, 在切值 的当前和先前的 S- 区域 结构结构结构- 将您 的变变换到我们 的缩缩缩成的图的缩图图 。