Similarity matching is a core operation in Siamese trackers. Most Siamese trackers carry out similarity learning via cross correlation that originates from the image matching field. However, unlike 2-D image matching, the matching network in object tracking requires 4-D information (height, width, channel and time). Cross correlation neglects the information from channel and time dimensions, and thus produces ambiguous matching. This paper proposes a spatio-temporal matching process to thoroughly explore the capability of 4-D matching in space (height, width and channel) and time. In spatial matching, we introduce a space-variant channel-guided correlation (SVC-Corr) to recalibrate channel-wise feature responses for each spatial location, which can guide the generation of the target-aware matching features. In temporal matching, we investigate the time-domain context relations of the target and the background and develop an aberrance repressed module (ARM). By restricting the abrupt alteration in the interframe response maps, our ARM can clearly suppress aberrances and thus enables more robust and accurate object tracking. Furthermore, a novel anchor-free tracking framework is presented to accommodate these innovations. Experiments on challenging benchmarks including OTB100, VOT2018, VOT2020, GOT-10k, and LaSOT demonstrate the state-of-the-art performance of the proposed method.
翻译:相近匹配是暹罗追踪器的核心操作。 大多数暹罗追踪器通过来自图像匹配字段的图像匹配字段的交叉关联进行相似性学习。 然而,与二维图像匹配不同, 对象跟踪匹配网络需要四维信息( 高度、 宽度、 频道和时间 ) 。 交叉关联忽略了频道和时间维度的信息, 从而产生了模糊的匹配 。 本文提出一个spatio- 时间匹配程序, 以彻底探索空间( 高度、 宽度和频道) 和时间 4D匹配的能力。 在空间匹配中, 我们引入了空间- 变异频道引导相关( SVC-Corr), 以对每个空间位置进行对等校准频道功能的响应, 这可以指导目标匹配功能的生成 4D( 高度、 宽度、 频道、 频道和时间 时间 ) 。 在时间匹配中, 我们调查目标和背景的时间- 并开发一个反差的模块( ARM ) 。 通过限制对间响应图的突变异性20, 我们的调可以明确抑制偏差,, 从而可以使每个空间频道引导引导对每个空间位置进行更可靠和精确的跟踪。 此外的频道功能定位跟踪框架, 。