The problem of visual object tracking has traditionally been handled by variant tracking paradigms, either learning a model of the object's appearance exclusively online or matching the object with the target in an offline-trained embedding space. Despite the recent success, each method agonizes over its intrinsic constraint. The online-only approaches suffer from a lack of generalization of the model they learn thus are inferior in target regression, while the offline-only approaches (e.g., convolutional siamese trackers) lack the target-specific context information thus are not discriminative enough to handle distractors, and robust enough to deformation. Therefore, we propose an online module with an attention mechanism for offline siamese networks to extract target-specific features under L2 error. We further propose a filter update strategy adaptive to treacherous background noises for discriminative learning, and a template update strategy to handle large target deformations for robust learning. Effectiveness can be validated in the consistent improvement over three siamese baselines: SiamFC, SiamRPN++, and SiamMask. Beyond that, our model based on SiamRPN++ obtains the best results over six popular tracking benchmarks and can operate beyond real-time.
翻译:视觉物体追踪问题历来由不同跟踪模式处理,要么学习对象外观完全在线的模型,要么在离线训练的嵌入空间将对象与目标相匹配。尽管最近取得了成功,但每种方法都对其内在限制产生痛苦。在线方法因缺乏对模型的概括性而受到影响,因此,在目标回归方面,它们所学到的模型在目标回归方面处于劣势,而离线方法(例如,共生跟踪器)缺乏目标特定背景信息,因此不足以对目标环境信息进行歧视,足以处理分散器,而且足够强大,足以变形。因此,我们提议一个在线模块,为离线性智能网络提供一个关注机制,以提取L2错误下的特定目标特征。我们进一步提议对用于歧视学习的错误背景噪音进行过滤性更新战略,并提议一个模板更新战略,处理大规模目标变形,以便进行强有力的学习。在三个Siamees基线(Siames base, SiamFC, SiamRPN++和SiamMask)的一致改进效果。除此之外,我们基于SiamRPN++的模型可以超越六个通用基准进行最佳的跟踪。