Visual object tracking is an essential capability of intelligent robots. Most existing approaches have ignored the online latency that can cause severe performance degradation during real-world processing. Especially for unmanned aerial vehicle, where robust tracking is more challenging and onboard computation is limited, latency issue could be fatal. In this work, we present a simple framework for end-to-end latency-aware tracking, i.e., end-to-end predictive visual tracking (PVT++). PVT++ is capable of turning most leading-edge trackers into predictive trackers by appending an online predictor. Unlike existing solutions that use model-based approaches, our framework is learnable, such that it can take not only motion information as input but it can also take advantage of visual cues or a combination of both. Moreover, since PVT++ is end-to-end optimizable, it can further boost the latency-aware tracking performance by joint training. Additionally, this work presents an extended latency-aware evaluation benchmark for assessing an any-speed tracker in the online setting. Empirical results on robotic platform from aerial perspective show that PVT++ can achieve up to 60% performance gain on various trackers and exhibit better robustness than prior model-based solution, largely mitigating the degradation brought by latency. Code and models will be made public.
翻译:视觉天体跟踪是智能机器人的基本能力。 大多数现有方法忽视了在现实世界处理过程中可能导致严重性能退化的在线潜值。 特别是对于无人驾驶飞行器来说, 强力跟踪更具有挑战性, 而在船上计算也有限, 潜值问题可能致命。 在这项工作中, 我们提出了一个简单框架, 用于端到端的潜意识跟踪, 即端到端的预测性跟踪( PVT+++) 。 PVT++ 能够通过附加一个在线预测器, 将大多数领先的跟踪器变成预测性跟踪器。 与使用基于模型的方法的现有解决方案不同, 我们的框架是可以学习的, 这样它不仅可以将信息作为输入动作, 也可以利用视觉提示或两者的组合。 此外, 由于 PVT++++ 是端到端的预测性跟踪, 它可以通过联合培训来进一步提升潜值跟踪性能跟踪性能。 此外, 这项工作将展示一个扩大的惯性测度评估基准, 用于评估在线设置中的任何高速跟踪器。 我们的框架是可学习的, 这样的框架不仅可以将信息作为输入输入信息, 光学信息, 而且结果的结果也可以从直观定位平台上显示, 快速的模型, 将显示, 快速的模型到快速的降解, 将显示, 通过前的模型将显示, 的模型将显示的模型可以实现更精确性能到演示性能, 演示的模型, 演示到演示的模型, 演示到前的降解到演示的模型, 显示, 可以显示到前的降解到演示到演示前的模型, 。