Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research -- generic object segmentation (of unknown categories) in videos, and video semantic segmentation -- by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.
翻译:视频分割 -- -- 将视频框架分割成多个片段或对象 -- -- 在广泛的实际应用中发挥着关键作用,从加强电影中的视觉效果,到了解自主驾驶的场景,到在视频会议中创造虚拟背景。最近,随着计算机视觉中连结主义的复兴,出现了大量基于深层次学习的视频分割方法,这些方法产生了令人信服的性能。在这次调查中,我们全面审查了两条基本研究线 -- -- 视频中的通用物体分割(不明类别)和视频语义分割 -- -- 通过介绍其各自的任务设置、背景概念、认知的需求、发展历史和主要挑战。我们还详细概述了有关方法和数据集的有代表性的文献。我们进一步将所审查的方法以几个众所周知的数据集作为基准。最后,我们指出该领域的开放问题,并提出进一步研究的机会。我们还提供一个公共网站,以持续跟踪这个快速推进领域的动态:https://github.com/tfzhou/VS-Suvey。