Video segmentation, i.e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, e.g., visual effect assistance in movie, scene understanding in autonomous driving, and virtual background creation in video conferencing, to name a few. Recently, due to the renaissance of connectionism in computer vision, there has been an influx of numerous deep learning based approaches that have been dedicated to video segmentation and delivered compelling performance. In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also provide a detailed overview of representative literature on both methods and datasets. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. At last, we point out a set of unsolved open issues in this field, and suggest possible opportunities for further research.
翻译:视频分割,即将视频框架分成多个区块或对象,在广泛的实际应用中发挥着关键作用,例如电影视觉效果协助、自主驾驶现场理解和电视会议虚拟背景创造,等等。最近,由于计算机视觉中连接主义的复兴,大量涌现了大量深层次的基于学习的方法,专门用于视频分割和提供令人信服的性能。在这次调查中,我们全面审查了该领域的两条基本研究线,即视频和视频语义分割中的通用物体分割(未知类别),方法是介绍其各自的任务设置、背景概念、认知的需要、发展历史和主要挑战。我们还详细概述了关于方法和数据集的有代表性的文献。此外,我们介绍了对基准数据集审查方法的定量性能比较。最后,我们指出了该领域一系列尚未解决的未决问题,并提出了进一步开展研究的可能机会。