While recent years have witnessed remarkable progress in the feature representation of visual tracking, the problem of feature misalignment between the classification and regression tasks is largely overlooked. The approaches of feature extraction make no difference for these two tasks in most of advanced trackers. We argue that the performance gain of visual tracking is limited since features extracted from the salient area provide more recognizable visual patterns for classification, while these around the boundaries contribute to accurately estimating the target state. We address this problem by proposing two customized feature extractors, named polar pooling and extreme pooling to capture task-specific visual patterns. Polar pooling plays the role of enriching information collected from the semantic keypoints for stronger classification, while extreme pooling facilitates explicit visual patterns of the object boundary for accurate target state estimation. We demonstrate the effectiveness of the task-specific feature representation by integrating it into the recent and advanced tracker RPT. Extensive experiments on several benchmarks show that our Customized Features based RPT (RPT++) achieves new state-of-the-art performances on OTB-100, VOT2018, VOT2019, GOT-10k, TrackingNet and LaSOT.
翻译:尽管近年来在视觉跟踪特征的描述方面取得了显著进展,但分类和回归任务之间的特征不匹配问题在很大程度上被忽视。特征提取方法对大多数先进的跟踪者中的这两项任务没有区别。我们认为,从突出地区提取的特征为分类提供了更可识别的视觉模式,而这些特征在边界周围有助于准确估计目标状态。我们提出两个定制的特征提取器,称为极地集合和极端集合,以捕捉特定任务的视觉模式,以解决这一问题。极地集合的作用是丰富从语系关键点收集的信息,以进行更强烈的分类,而极端集中有助于为准确的目标状态估计目的边界的明确视觉模式。我们通过将其纳入最新和先进的跟踪器RPT,展示了具体任务特征体现的有效性。关于几个基准的广泛实验表明,我们基于RPT(RPT++)的定制功能在OTB-100、VOT2018、VOT2019、MT-10k、跟踪网络和LASOT上实现了新的状态。