Recently, we have seen a rapid development of Deep Neural Network (DNN) based visual tracking solutions. Some trackers combine the DNN-based solutions with Discriminative Correlation Filters (DCF) to extract semantic features and successfully deliver the state-of-the-art tracking accuracy. However, these solutions are highly compute-intensive, which require long processing time, resulting unsecured real-time performance. To deliver both high accuracy and reliable real-time performance, we propose a novel tracker called SiamVGG\footnote{https://github.com/leeyeehoo/SiamVGG}. It combines a Convolutional Neural Network (CNN) backbone and a cross-correlation operator, and takes advantage of the features from exemplary images for more accurate object tracking. The architecture of SiamVGG is customized from VGG-16 with the parameters shared by both exemplary images and desired input video frames. We demonstrate the proposed SiamVGG on OTB-2013/50/100 and VOT 2015/2016/2017 datasets with the state-of-the-art accuracy while maintaining a decent real-time performance of 50 FPS running on a GTX 1080Ti. Our design can achieve 2% higher Expected Average Overlap (EAO) compared to the ECO and C-COT in VOT2017 Challenge.
翻译:最近,我们目睹了深神经网络(DNN)基于视觉跟踪解决方案的快速发展。一些跟踪者将基于DNN的解决方案与差异性关联过滤器(DCF)相结合,以提取语义特征并成功提供最新跟踪准确性。然而,这些解决方案的计算密度很高,需要很长时间的处理时间,从而导致无保障实时性能。为了提供高准确性和可靠的实时性能,我们提议了一个名为SiamVGG\foot{https://github.com/leyeehoo/SiambGG}的新跟踪器。它将基于DNN的解决方案与差异性关系过滤过滤过滤过滤器(DCFFFFF)的主干线和跨曲线操作器结合起来,并利用模拟图像的功能进行更准确的跟踪。SiamVGGGG-16的架构根据模样图像和想要的视频框架所共享的参数进行定制。我们提议在OTB20-2013/50/100和VOT 2015/2016/2016年6数据集上使用州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-