Correlation has a critical role in the tracking field, especially in recent popular Siamese-based trackers. The correlation operation is a simple fusion method that considers the similarity between the template and the search region. However, the correlation operation is a local linear matching process, losing semantic information and easily falling into a local optimum, which may be the bottleneck in designing high-accuracy tracking algorithms. In this work, to determine whether a better feature fusion method exists than correlation, a novel attention-based feature fusion network, inspired by the transformer, is presented. This network effectively combines the template and search region features using attention. Specifically, the proposed method includes an ego-context augment module based on self-attention and a cross-feature augment module based on cross-attention. First, we present a transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head. Based on the TransT baseline, we further design a segmentation branch to generate an accurate mask. Finally, we propose a stronger version of TransT by extending TransT with a multi-template scheme and an IoU prediction head, named TransT-M. Experiments show that our TransT and TransT-M methods achieve promising results on seven popular datasets. Code and models are available at https://github.com/chenxin-dlut/TransT-M.
翻译:相关操作是一种简单的聚合方法,它考虑到模板和搜索区域之间的相似性。然而,相关操作是一个局部线性匹配过程,失去语义信息,容易跌入一个本地最佳,这可能是设计高精度跟踪算法的瓶颈。在这项工作中,为了确定是否存在比相关性更好的特性融合方法,介绍了一种由变压器启发的新颖的基于关注的聚合功能网络。这个网络有效地将模板和搜索区域特征结合到关注中。具体地说,拟议方法包括一个基于自我注意的自定义字符串联增强模块和一个基于交叉注意的跨功能增强模块。首先,我们提出一个基于类似Siameese的特征提取主干线的变异器跟踪(名为TransT)方法,设计基于关注的聚合机制,以及分类和回归头部。基于 TransTLT 基线,我们进一步设计了一个用于生成准确的掩码和搜索区域功能的分区分支。最后,我们提议了一个基于T Transal-M Transal-M 的变换M-Traeal-TravelyM-TravelmentM-SyM-Creal-Creal Sy-Sy-Sil-Sy-Sil-Sy-Sy-T 显示一个更强有力的,在I.