Tracking tasks based on deep neural networks have greatly improved with the emergence of Siamese trackers. However, the appearance of targets often changes during tracking, which can reduce the robustness of the tracker when facing challenges such as aspect ratio change, occlusion, and scale variation. In addition, cluttered backgrounds can lead to multiple high response points in the response map, leading to incorrect target positioning. In this paper, we introduce two transformer-based modules to improve Siamese tracking called DASTSiam: the spatio-temporal (ST) fusion module and the Discriminative Augmentation (DA) module. The ST module uses cross-attention based accumulation of historical cues to improve robustness against object appearance changes, while the DA module associates semantic information between the template and search region to improve target discrimination. Moreover, Modifying the label assignment of anchors also improves the reliability of the object location. Our modules can be used with all Siamese trackers and show improved performance on several public datasets through comparative and ablation experiments.
翻译:以深神经网络为基础的跟踪任务随着暹罗跟踪器的出现而大为改善。然而,在跟踪过程中,目标的出现往往会发生改变,这可以降低跟踪器在面临方位比变化、隔离和比例变化等挑战时的稳健性。此外,背景混杂可能导致响应图中出现多个高响应点,导致目标定位不正确。在本文中,我们引入了两个基于变压器的模块,以改进Siames的跟踪,称为DASTSiam:spatio-时间(ST)聚变模块和差异性加速模块。ST模块利用基于交叉注意的历史提示积累来提高对对象外观变化的稳健性,而DA模块则将模板和搜索区域之间的静语信息联系起来,以改善目标歧视。此外,修改锚的标签分配还提高了对象位置的可靠性。我们模块可以与所有Siamiese跟踪器一起使用,并通过比较和扩张实验显示若干公共数据集的改进性。