Observing that Semantic features learned in an image classification task and Appearance features learned in a similarity matching task complement each other, we build a twofold Siamese network, named SA-Siam, for real-time object tracking. SA-Siam is composed of a semantic branch and an appearance branch. Each branch is a similarity-learning Siamese network. An important design choice in SA-Siam is to separately train the two branches to keep the heterogeneity of the two types of features. In addition, we propose a channel attention mechanism for the semantic branch. Channel-wise weights are computed according to the channel activations around the target position. While the inherited architecture from SiamFC \cite{SiamFC} allows our tracker to operate beyond real-time, the twofold design and the attention mechanism significantly improve the tracking performance. The proposed SA-Siam outperforms all other real-time trackers by a large margin on OTB-2013/50/100 benchmarks.
翻译:观察到在图像分类任务中学习的语义特征和在相似匹配任务中学习的外观特征互为补充,我们建立了一个名为SA-Siam的双子网,用于实时跟踪物体。SA-Siam是一个语义分支和一个外观分支。每个分支都是一个类似学习暹米的网络。SA-Siam的一个重要设计选择是分别培训这两个分支,以保持两种特征的异质性。此外,我们建议为语义分支建立一个频道关注机制。根据目标位置周围的频道启动量计算频道的权重。虽然SiamFC\cite{SiamFC}的遗留结构允许我们的跟踪器在实时之外运作,但双向设计和关注机制可以大大改善跟踪性能。拟议的SA-Siam在OTB-2013/50/100基准上大大超过所有其他实时跟踪器。