Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patches, via strong temporal slowness constraint and stacked convolutional autoencoders. The deep slow local representations are learned offline on unlabeled data and transferred to the observational model of our proposed tracker. The proposed observational model retains old training samples to alleviate drift, and collect negative samples which are coherent with target's motion pattern for better discriminative tracking. With the learned representation and online training samples, a logistic regression classifier is adopted to distinguish target from background, and retrained online to adapt to appearance changes. Subsequently, the observational model is integrated into a particle filter framework to peform visual tracking. Experimental results on various challenging benchmark sequences demonstrate that the proposed tracker performs favourably against several state-of-the-art trackers.
翻译:视觉表现对于视觉跟踪方法的性能至关重要。 常规上,视觉表现在视觉跟踪中采用的视觉表现依赖于手工制作的计算机视觉描述符。 这些描述符一般是在没有考虑具体跟踪信息的情况下开发的。 在本文中,我们建议通过强大的时间慢化限制和堆叠的进化自动电解器,从跟踪连续图像补丁中学习复杂而有价值的变化表现。 深为缓慢的地方表现仪在未贴标签的数据上进行离线学习,并被传输到我们拟议跟踪器的观察模型中。 拟议的观察模型保留了旧的培训样本,以缓解漂移,并收集与目标的运动模式一致的负面样本,以更好地进行歧视跟踪。 在所学的展示和在线培训样本中,采用了一个物流回归分类器,将目标与背景区分开来,并重新培训在线适应外观变化。 随后,观察模型被纳入粒子过滤框架,以形成视觉跟踪。 各种挑战性基准序列的实验结果显示,拟议的跟踪器对几个州级跟踪器表现良好。