We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. We present a novel and efficient way to benefit from dual-template representation for object model adaption, which incorporates temporal information with only a single learnable parameter. We further improve the tracker architecture with a pixel-wise fusion block. By plugging-in sophisticated backbones with the abovementioned modules, FEAR-M and FEAR-L trackers surpass most Siamese trackers on several academic benchmarks in both accuracy and efficiency. Employed with the lightweight backbone, the optimized version FEAR-XS offers more than 10 times faster tracking than current Siamese trackers while maintaining near state-of-the-art results. FEAR-XS tracker is 2.4x smaller and 4.3x faster than LightTrack with superior accuracy. In addition, we expand the definition of the model efficiency by introducing FEAR benchmark that assesses energy consumption and execution speed. We show that energy consumption is a limiting factor for trackers on mobile devices. Source code, pretrained models, and evaluation protocol are available at https://github.com/PinataFarms/FEARTracker.
翻译:我们提出了由快速、高效、准确和强健的暹罗视觉跟踪器组成的FEAR。我们展示了一种创新而有效的方法,让物体模型适配的双板图象代表受益,它包含的时间信息只有一个可学习参数。我们进一步改进了跟踪器结构,用像素聚合块来改进跟踪器结构。通过上述模块连接尖端骨干,FEAR-M和FEAR-L跟踪器在精确和效率两方面的多个学术基准上超过了大多数Siame跟踪器。我们使用轻量骨干,优化版FEAR-XS提供了比当前Siamse追踪器更快的10倍多的跟踪器,同时保持接近最新结果。FEAR-S跟踪器比LightTrack更小2.4x和4.3x。此外,我们通过引入评估能源消耗和执行速度的FEAR基准来扩大模型效率定义。我们显示,能源消耗是移动设备跟踪器的一个限制因素。源码、预设的模型和评价协议可在 https://githrackub.com/Pinafarmarger.