Accurate detection and tracking of objects is vital for effective video understanding. In previous work, the two tasks have been combined in a way that tracking is based heavily on detection, but the detection benefits marginally from the tracking. To increase synergy, we propose to more tightly integrate the tasks by conditioning the object detection in the current frame on tracklets computed in prior frames. With this approach, the object detection results not only have high detection responses, but also improved coherence with the existing tracklets. This greater coherence leads to estimated object trajectories that are smoother and more stable than the jittered paths obtained without tracklet-conditioned detection. Over extensive experiments, this approach is shown to achieve state-of-the-art performance in terms of both detection and tracking accuracy, as well as noticeable improvements in tracking stability.
翻译:准确探测和跟踪物体对于有效的视频理解至关重要。 在以往的工作中,将这两项任务合并在一起,追踪主要以探测为基础,但检测只从跟踪中受益。为了增强协同效应,我们提议通过在先前框架计算的跟踪器上对当前框架的物体探测进行调节,更严格地整合任务。采用这种方法,物体探测结果不仅具有高探测反应,而且与现有跟踪器的配合性也有所改善。这种更大的一致性导致估计的物体轨迹比在没有跟踪设备条件下探测的飞速路径更加平滑和稳定。经过广泛的实验,这一方法显示在探测和跟踪准确性方面达到最先进的业绩,并在跟踪稳定性方面明显改善。