Discrete correlation filter (DCF) based trackers have shown considerable success in visual object tracking. These trackers often make use of low to mid level features such as histogram of gradients (HoG) and mid-layer activations from convolution neural networks (CNNs). We argue that including semantically higher level information to the tracked features may provide further robustness to challenging cases such as viewpoint changes. Deep salient object detection is one example of such high level features, as it make use of semantic information to highlight the important regions in the given scene. In this work, we propose an improvement over DCF based trackers by combining saliency based and other features based filter responses. This combination is performed with an adaptive weight on the saliency based filter responses, which is automatically selected according to the temporal consistency of visual saliency. We show that our method consistently improves a baseline DCF based tracker especially in challenging cases and performs superior to the state-of-the-art. Our improved tracker operates at 9.3 fps, introducing a small computational burden over the baseline which operates at 11 fps.
翻译:基于分层相关过滤器的跟踪器在视觉物体跟踪中表现出相当的成功。 这些跟踪器通常使用低到中等水平的特征,如梯度直方图(HoG)和来自神经神经网络(CNNs)的中层激活。 我们争辩说,将精度较高的信息纳入跟踪特征中,可能会为诸如视觉变化等具有挑战性的案例提供进一步的稳健性。 深显性天体探测是这种高水平特征的一个例子,因为它利用语义信息来突出特定场景的重要区域。 在这项工作中,我们建议通过将显性与基于其它特征的过滤器反应结合起来,改进基于 DCF 的跟踪器。 这种组合是在基于显性过滤器的反应上以适应性重量进行的,而根据视觉显著特征的反应是根据视觉特征的时间一致性自动选择的。 我们表明,我们的方法在不断改进基于基准的 DCFC 追踪器,特别是在具有挑战性的案例中,并表现优于艺术状态。 我们改进的跟踪器在9.3 fps 运行,在11 fps 运行的基线上引入一个小的计算负担。