Object detection and tracking in videos represent essential and computationally demanding building blocks for current and future visual perception systems. In order to reduce the efficiency gap between available methods and computational requirements of real-world applications, we propose to re-think one of the most successful methods for image object detection, Faster R-CNN, and extend it to the video domain. Specifically, we extend the detection framework to learn instance-level embeddings which prove beneficial for data association and re-identification purposes. Focusing on the computational aspects of detection and tracking, our proposed method reaches a very high computational efficiency necessary for relevant applications, while still managing to compete with recent and state-of-the-art methods as shown in the experiments we conduct on standard object tracking benchmarks
翻译:视频中的物体探测和跟踪是当前和未来视觉观察系统的基本和计算上要求很高的组成部分。为了缩小现有方法与现实世界应用的计算要求之间的效率差距,我们提议重新思考最成功的图像物体探测方法之一,即快速R-CNN,并将其推广到视频领域。具体地说,我们扩大探测框架以学习对数据关联和重新识别目的有益的实例级嵌入。我们侧重于探测和跟踪的计算方面,我们提出的方法达到了相关应用所需的极高的计算效率,同时仍然设法与我们在标准物体跟踪基准实验中显示的最新和最先进的方法竞争。