The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.
翻译:任意天体跟踪问题历来是通过在网上学习物体外观的模型来解决的,使用视频本身作为唯一的培训数据。尽管这些方法取得了成功,但仅靠在线的方法本身就限制了他们能够学习的模型的丰富性。最近,有人试图利用深层革命网络的表达力。然而,如果事先不知道要跟踪的物体,则有必要通过在线进行斯托查特梯层,以适应网络的重量,严重损害系统的速度。在本文中,我们用一部新型的全革命性Siamse网络来配置基本跟踪算法,在ILSVRC15的视频天体探测数据集上进行端对端培训。我们的追踪器运行的架速超过实时,并且尽管非常简单,仍然以多种基准达到最先进的性能。