Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. Our experiments on standard benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training without modifying architectures.
翻译:尽管广泛采用了关于视觉物体跟踪任务的机器学习方法,但最近的学习方法基本上忽视了视觉追踪是其性质上一个序列级任务这一事实;它们严重依赖框架级培训,这不可避免地导致在数据分布和任务目标两方面的培训和测试不一致;这项工作引入了基于强化学习的视觉跟踪序列级培训战略,并讨论了数据取样、学习目标和数据增强的序列级设计如何提高跟踪算法的准确性和稳健性。 我们在LaSOT、跟踪网和GOT-10k等标准基准方面的实验表明,四个代表性跟踪模型,即SiamRPN+++、SiamAttn、TransT和TrDIMP, 通过在不修改结构的情况下将拟议方法纳入培训,不断改进。