通过选择和精炼进行视觉跟踪的视频说明 (Video Annotation for Visual Tracking via Selection and Refinement)

Deep learning based visual trackers entail offline pre-training on large volumes of video datasets with accurate bounding box annotations that are labor-expensive to achieve. We present a new framework to facilitate bounding box annotations for video sequences, which investigates a selection-and-refinement strategy to automatically improve the preliminary annotations generated by tracking algorithms. A temporal assessment network (T-Assess Net) is proposed which is able to capture the temporal coherence of target locations and select reliable tracking results by measuring their quality. Meanwhile, a visual-geometry refinement network (VG-Refine Net) is also designed to further enhance the selected tracking results by considering both target appearance and temporal geometry constraints, allowing inaccurate tracking results to be corrected. The combination of the above two networks provides a principled approach to ensure the quality of automatic video annotation. Experiments on large scale tracking benchmarks demonstrate that our method can deliver highly accurate bounding box annotations and significantly reduce human labor by 94.0%, yielding an effective means to further boost tracking performance with augmented training data.

翻译：深入学习的视觉跟踪器需要就大量具有准确的捆绑框说明的视频数据集进行离线前培训,这些数据集需要花费大量人力才能实现。我们提出了一个新的框架,以便利视频序列的捆绑框说明,该框架调查了自动改进跟踪算法生成的初步说明的筛选和精细战略。提议了一个时间评估网络(T-Asess Net),它能够测量目标地点的时间一致性,并通过测量其质量来选择可靠的跟踪结果。同时,还设计了一个视觉大地测量改进网络(VG-Refine Net),通过考虑目标外观和时间几何限制来进一步加强选定的跟踪结果,从而允许纠正不准确的跟踪结果。以上两个网络的组合提供了一个原则性方法,以确保自动视频注释的质量。大规模跟踪基准实验表明,我们的方法可以提供高度准确的捆绑绑框说明,并通过测量其质量而显著减少人类劳动力的94.0%,从而产生一种有效的手段,通过强化培训数据来进一步提高跟踪业绩。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR2021】基于反事实推断的视觉问答框架

专知会员服务

27+阅读 · 2021年3月4日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日