Considerable unsupervised video object segmentation algorithms based on deep learning have the problem of substantive model parameters and computation, which significantly limits the application of the algorithm in practice. This paper proposes a video object segmentation network based on motion guidance, considerably reducing the number of model parameters and computation and improving the video object segmentation performance. The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module. Specifically, RGB images and optical flow estimation are fed into dual-stream network to extract object appearance features and motion features. Then, the motion guidance module extracts the semantic information from the motion features through local attention, which guides the appearance features to learn rich semantic information. Finally, the multi-scale progressive fusion module obtains the output features at each stage of the dual-stream network. It gradually integrates the deep features into the shallow ones yet improves the edge segmentation effect. In this paper, numerous evaluations are conducted on three standard datasets, and the experimental results prove the superior performance of the proposed method.
翻译:基于深层学习的大量未经监督的视频物体分离算法存在实质性模型参数和计算问题,这大大限制了算法的实际应用。本文提议基于运动指导的视频物体分离网络,大大减少模型参数的数量和计算,并改进视频物体分割性能。模型包括双流网络、运动指导模块和多尺度渐进式递化模块。具体地说,RGB图像和光学流估计被输入双流网络,以提取物体外观特征和运动特征。然后,运动指导模块通过当地关注从运动特征中提取语义信息,引导外观特征学习丰富的语义信息。最后,多尺度渐进式聚变模块在双流网络的每个阶段都获得了输出特征。该模型逐渐将深度特征融入浅层网络,同时改善边缘分化效应。在本文中,对三个标准数据集进行了大量评价,实验结果证明了拟议方法的优异性。