Accurate tracking is still a challenging task due to appearance variations, pose and view changes, and geometric deformations of target in videos. Recent anchor-free trackers provide an efficient regression mechanism but fail to produce precise bounding box estimation. To address these issues, this paper repurposes a Transformer-alike regression branch, termed as Target Transformed Regression (TREG), for accurate anchor-free tracking. The core to our TREG is to model pair-wise relation between elements in target template and search region, and use the resulted target enhanced visual representation for accurate bounding box regression. This target contextualized representation is able to enhance the target relevant information to help precisely locate the box boundaries, and deal with the object deformation to some extent due to its local and dense matching mechanism. In addition, we devise a simple online template update mechanism to select reliable templates, increasing the robustness for appearance variations and geometric deformations of target in time. Experimental results on visual tracking benchmarks including VOT2018, VOT2019, OTB100, GOT10k, NFS, UAV123, LaSOT and TrackingNet demonstrate that TREG obtains the state-of-the-art performance, achieving a success rate of 0.640 on LaSOT, while running at around 30 FPS. The code and models will be made available at https://github.com/MCG-NJU/TREG.
翻译:由于外观变异、外观变化和视觉变化以及视频中目标的几何变形,准确的跟踪仍然是一项艰巨的任务。最近的无锚跟踪器提供了有效的回归机制,但未能产生精确的捆绑框估计。为了解决这些问题,本文将类似变异的回归分支重新定位为变形器的回归分支,称为目标变异回归分支(TREG),以进行准确的无锚跟踪。我们的TREG的核心是模拟目标模板和搜索区域要素之间的双向关系,并使用结果目标增强的视觉代表来进行精确的约束框回归。这个目标背景化代表能够加强目标相关信息,帮助精确定位框边界,并在某种程度上处理物体变形,因为其地方和密集的匹配机制。此外,我们设计了一个简单的在线模板更新机制,以选择可靠的模板,提高外观变异异性和对目标的几何性。关于视觉跟踪基准的实验结果,包括VOT2018、VOT2019、OTBT100、MT10k、NFS、UAV123、LSOT和跟踪网络将显示在运行中运行的AS-GS-CADRADRA将成功率在30的运行中,同时显示将成功率将获得州-GADGS-BS-ADS-A的运行的运行率。