Unsupervised video object segmentation aims to segment a target object in the video without a ground truth mask in the initial frame. This challenging task requires extracting features for the most salient common objects within a video sequence. This difficulty can be solved by using motion information such as optical flow, but using only the information between adjacent frames results in poor connectivity between distant frames and poor performance. To solve this problem, we propose a novel prototype memory network architecture. The proposed model effectively extracts the RGB and motion information by extracting superpixel-based component prototypes from the input RGB images and optical flow maps. In addition, the model scores the usefulness of the component prototypes in each frame based on a self-learning algorithm and adaptively stores the most useful prototypes in memory and discards obsolete prototypes. We use the prototypes in the memory bank to predict the next query frames mask, which enhances the association between distant frames to help with accurate mask prediction. Our method is evaluated on three datasets, achieving state-of-the-art performance. We prove the effectiveness of the proposed model with various ablation studies.
翻译:未经监督的视频对象分割旨在将视频中的目标对象在初始框架中没有地面真相掩码的图像中进行分割。 这个具有挑战性的任务要求在视频序列中为最突出的常见对象提取特征。 这个困难可以通过使用光学流等运动信息来解决, 但仅使用相邻框架之间的信息导致远框架之间的连接性差和性能差。 为了解决这个问题, 我们提议了一个新型的原型记忆网络结构。 拟议的模型通过从输入的 RGB 图像和光学流图中提取超级像素组件原型来有效提取 RGB 和运动信息。 此外, 模型根据自学算法对每个框架中的组件原型进行评分, 并适应性地存储记忆和废弃原型中最有用的原型。 我们使用记忆库中的原型来预测下一个查询框架面罩, 从而增强远框架之间的关联性关系, 从而帮助准确的掩码预测。 我们的方法在三个数据集上进行了评估, 取得了最新性能。 此外, 我们用各种断层研究证明了拟议模型的有效性 。