The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. Since no extra heavy relation modeling module is needed and the implementation is highly parallelized, the proposed tracker runs at a fast speed. To further improve the inference efficiency, an in-network candidate early elimination module is proposed based on the strong similarity prior calculated in the one-stream framework. As a unified framework, OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k, i.e., achieving 73.7% AO, improving the existing best result (SwinTrack) by 4.3\%. Besides, our method maintains a good performance-speed trade-off and shows faster convergence. The code and models are available at https://github.com/botaoye/OSTrack.
翻译:目前流行的双流、两阶段跟踪框架将模板和搜索区域分别抽取,然后进行关系模型,因此,提取的功能缺乏对目标的认识,目标背景差异有限。为了解决上述问题,我们提议一个新的单流跟踪(OSTrack)框架,通过将模板搜索图像配对与双向信息流动连接起来,统一特征学习和关系模型。通过这种方式,通过相互指导,可以动态地提取歧视性的目标导向特征。由于不需要额外的超重关系模型模块,而且执行高度平行,因此拟议的跟踪器运行速度很快。为了进一步提高推断效率,根据以前在一流框架中计算的强烈相似性,提议了一个网络内候选人早期消除模块。作为一个统一的框架,OSTrack在多个基准上取得了最先进的业绩。特别是,它展示了通过一线追踪基准MON-10k(即实现73.7% AOO,改进现有最佳结果(SwinTrack),在4.3/ST-Obxxxxxx