Since first proposed, Video Instance Segmentation(VIS) task has attracted vast researchers' focus on architecture modeling to boost performance. Though great advances achieved in online and offline paradigms, there are still insufficient means to identify model errors and distinguish discrepancies between methods, as well approaches that correctly reflect models' performance in recognizing object instances of various temporal lengths remain barely available. More importantly, as the fundamental model abilities demanded by the task, spatial segmentation and temporal association are still understudied in both evaluation and interaction mechanisms. In this paper, we introduce TIVE, a Toolbox for Identifying Video instance segmentation Errors. By directly operating output prediction files, TIVE defines isolated error types and weights each type's damage to mAP, for the purpose of distinguishing model characters. By decomposing localization quality in spatial-temporal dimensions, model's potential drawbacks on spatial segmentation and temporal association can be revealed. TIVE can also report mAP over instance temporal length for real applications. We conduct extensive experiments by the toolbox to further illustrate how spatial segmentation and temporal association affect each other. We expect the analysis of TIVE can give the researchers more insights, guiding the community to promote more meaningful explorations for video instance segmentation. The proposed toolbox is available at https://github.com/wenhe-jia/TIVE.
翻译:自最初提出以来,视频事件分类(VIS)任务吸引了大量研究人员关注建筑建模,以提高绩效。尽管在在线和离线范式方面取得了巨大进步,但是仍然没有足够的手段来识别模型错误和区分方法之间的差异,以及正确反映模型在识别不同时间长度的物体实例方面的性能的方法,目前仍然几乎无法找到。更重要的是,由于任务所要求的基本模型能力,空间分割和时间关联在评价和互动机制中仍然没有得到充分研究。在本文件中,我们引入了TIVE,一个识别视频实例分割错误的工具箱。通过直接操作输出预测文件,TIVE定义了每个类型对 mAP的偏差类型和重量,目的是区分模型字符。通过在空间时空层面分化质量的分解,模型在空间分隔和时间关联方面的潜在回溯性能力可以被揭示。TIVE还可以在实际应用的时间长度上报告 mAP。我们通过工具箱进行广泛的实验,以进一步说明空间分割和时间关联如何影响对方。我们期望TIVE-VIVI分析每个类型对模型的损害。我们期望TiveHIVIVIVI 工具可以向研究人员提供更有意义的见解。