Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario. OVIS consists of 296k high-quality instance masks and 901 occluded scenes. While our human vision systems can perceive those occluded objects by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, all baseline methods encounter a significant performance degradation of about 80% in the heavily occluded object group, which demonstrates that there is still a long way to go in understanding obscured objects and videos in a complex real-world scenario. To facilitate the research on new paradigms for video understanding systems, we launched a challenge based on the OVIS dataset. The submitted top-performing algorithms have achieved much higher performance than our baselines. In this paper, we will introduce the OVIS dataset and further dissect it by analyzing the results of baselines and submitted methods. The OVIS dataset and challenge information can be found at http://songbai.site/ovis .
翻译:尽管近年来深层次的学习方法已经取得了高级视频对象识别的绩效,但是在视频中发现严重隐蔽的天体仍是一项非常艰巨的任务。为了促进发展隐蔽性理解,我们收集了一个称为 OVIS 的大规模数据集,用于隐蔽情景中的视频实例分解。 OVIS 由296k 高质量实例掩体和 901 隐蔽场景组成。虽然我们的人类视觉系统能够通过背景推理和关联来感知这些隐蔽的天体,但我们的实验表明,目前的视频理解系统无法。在OVIS 数据集中,所有基线方法在严重隐蔽的天体组中都遇到约80%的重大性能退化,这表明在复杂的现实情景中,在理解模糊的天体和视频方面还有很长的路要走。为了便利对视频理解系统的新范式的研究,我们根据 OVIS 数据集发起了一项挑战。提交的顶级演算法比我们的基线要高得多。在本文中,我们将介绍OVIS 数据设置,并通过分析基线结果和提交的方法进一步分解它。OVIS 数据可以被找到。