In surveillance and search and rescue applications, it is important to perform multi-target tracking (MOT) in real-time on low-end devices. Today's MOT solutions employ deep neural networks, which tend to have high computation complexity. Recognizing the effects of frame sizes on tracking performance, we propose DeepScale, a model agnostic frame size selection approach that operates on top of existing fully convolutional network-based trackers to accelerate tracking throughput. In the training stage, we incorporate detectability scores into a one-shot tracker architecture so that DeepScale can learn representation estimations for different frame sizes in a self-supervised manner. During inference, based on user-controlled parameters, it can find a suitable trade-off between tracking accuracy and speed by adapting frame sizes at run time. Extensive experiments and benchmark tests on MOT datasets demonstrate the effectiveness and flexibility of DeepScale. Compared to a state-of-the-art tracker, DeepScale++, a variant of DeepScale achieves 1.57X accelerated with only moderate degradation (~ 2.4) in tracking accuracy on the MOT15 dataset in one configuration.
翻译:在监视、搜索和救援应用中,必须在低端装置上实时进行多目标跟踪(MOT)非常重要。今天的MOT解决方案采用深神经网络,这种网络往往具有很高的计算复杂性。认识到框架大小对跟踪性能的影响,我们提议“DeepSqual”这一模型不可知框架大小选择方法,在现有全演网络跟踪器之上运作,以加速跟踪吞吐量。在培训阶段,我们将可探测性分数纳入一个一发追踪器结构,以便深波段能够以自我监督的方式学习不同框架大小的表达估计。在根据用户控制的参数进行推断时,它可以找到在通过调整运行时间框架大小来跟踪准确性和速度之间的适当权衡。关于MOT数据集的广泛实验和基准测试显示了“深波段”数据集的有效性和灵活性。与“深波段+++”这一最新跟踪器相比,“深波段”的变型在跟踪一个配置的MOT15数据集的准确性时,只能以中度降解(~2.4)的速度加快1.57X。