The design of more complex and powerful neural network models has significantly advanced the state-of-the-art in visual object tracking. These advances can be attributed to deeper networks, or to the introduction of new building blocks, such as transformers. However, in the pursuit of increased tracking performance, efficient tracking architectures have received surprisingly little attention. In this paper, we introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models, making it the only real-time transformer-based tracker. When compared to lightweight trackers that can operate in real-time on standard CPUs, E.T.Track consistently outperforms all other methods on the LaSOT, OTB-100, NFS, TrackingNet and VOT-ST2020 datasets. The code will soon be released on https://github.com/visionml/pytracking.
翻译:更复杂、更强大的神经网络模型的设计大大推进了视觉物体跟踪的最先进的神经网络模型,这些进步可归因于更深的网络,或采用新的构件,如变压器。然而,在追求提高跟踪性能的过程中,高效的跟踪结构受到的注意却少得惊人。在本文中,我们引入了Exmplar变压器,这是实时视觉物体跟踪的有效变压器。E.Track,我们的视觉跟踪器,包含Exmplar变压器层,运行速度为47英尺。这比基于变压器的其他模型要快8倍,成为唯一的实时变压器跟踪器。与能够在标准CPPS上实时运行的轻量跟踪器相比,E.T.Track始终超越了LASOT、OT-100、NFS、跟踪网和VOT-ST-220数据集的所有其他方法。该代码不久将发布在 https://github.com/visionml/pytracking上。