Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to virtual background creation in video conferencing, just to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research - generic object segmentation (of unknown categories) in videos and video semantic segmentation - by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. Finally, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research. A public website is provided to continuously track recent developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.
翻译:视频截断,即将视频框分割成多个片段或物体,在广泛的实际应用中发挥着关键作用,从加强电影中的视觉效果,到了解自主驾驶的场景,到电视会议中的虚拟背景创造,仅举几个例子。最近,由于计算机视觉中连接主义的复兴,出现了大量基于深层次学习的视频截断方法,这些方法产生了令人信服的性能。在这次调查中,我们全面审查了两条基本研究路线——视频和视频语义分割中的通用物体分割(未知类别)——通过介绍各自的任务设置、背景概念、感觉的需要、发展史和主要挑战。我们还详细概述了关于方法和数据集的有代表性的文献。此外,我们还介绍了关于基准数据集审查方法的定量业绩比较。最后,我们指出这一领域一系列尚未解决的开放问题,并提出进一步研究的可能机会。提供一个公共网站,以持续跟踪这个快速推进领域的近期动态:https://github.com/tfzhy/S-Suvey。