Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied. To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects. The target objects in the videos are commonly occluded by others and disappear in some frames. To analyze the proposed MOSE dataset, we benchmark 18 existing VOS methods under 4 different settings on the proposed MOSE dataset and conduct comprehensive comparisons. The experiments show that current VOS algorithms cannot well perceive objects in complex scenes. For example, under the semi-supervised VOS setting, the highest J&F by existing state-of-the-art VOS methods is only 59.4% on MOSE, much lower than their ~90% J&F performance on DAVIS. The results reveal that although excellent performance has been achieved on existing benchmarks, there are unresolved challenges under complex scenes and more efforts are desired to explore these challenges in the future. The proposed MOSE dataset has been released at https://henghuiding.github.io/MOSE.
翻译:视频对象分割( VOS) 的目的是在整个视频剪辑序列中分割特定对象。 最新的 VOS 方法在现有的数据集上取得了极佳的性能( 例如, 90 ⁇ J&F) 。 但是, 由于这些现有数据集中的目标对象通常相对突出、 突出和孤立, 很少研究复杂的场景下 VOS 的目标对象。 要重新审视 VOS 并使其在现实世界中更加适用, 我们收集了一个新的 VOS 数据集, 名为 CoMpllipple 视频对象执行( MOSE ), 以研究复杂环境中的跟踪和分割对象。 MOS 方法中包含2, 149个视频剪辑和来自36个类别的5, 200个对象, 其中有431, 725个高质量的目标分割面面罩。 MOS 数据集最显著的特征是拥挤和隐蔽的场景色。 视频目标通常被其他人所包围, 仅在某些框中消失。 为了分析拟议的 MOS 数据集在提议的 MIS 4 中, 我们将现有的18个 VOS 方法作为基准在4个不同的环境下,,, 进行全面的对比。 。 在当前的 IMOS 中, 在当前的 中, 在当前的 常规和 VIS 分析中, 在当前的 VIS 演算法中, 在当前的 VIS 中, 最高级演算法中, 在现有的演算法中, 中, 中, 在当前的 VSO 中, 大多无法 。