In vision-enabled autonomous systems such as robots and autonomous cars, video object detection plays a crucial role, and both its speed and accuracy are important factors to provide reliable operation. The key insight we show in this paper is that speed and accuracy are not necessarily a trade-off when it comes to image scaling. Our results show that re-scaling the image to a lower resolution will sometimes produce better accuracy. Based on this observation, we propose a novel approach, dubbed AdaScale, which adaptively selects the input image scale that improves both accuracy and speed for video object detection. To this end, our results on ImageNet VID and mini YouTube-BoundingBoxes datasets demonstrate 1.3 points and 2.7 points mAP improvement with 1.6x and 1.8x speedup, respectively. Additionally, we improve state-of-the-art video acceleration work by an extra 1.25x speedup with slightly better mAP on ImageNet VID dataset.
翻译:在机器人和自主汽车等有视觉功能的自主系统中,视频物体探测发挥着关键作用,其速度和准确性都是提供可靠操作的重要因素。我们在本文中显示的关键洞察力是,速度和准确性不一定在图像缩放方面是取舍的。我们的结果表明,将图像重新缩放到较低分辨率有时会提高准确性。基于这一观察,我们提议了一种新颖的方法,称为Adascale,以适应性的方式选择输入图像比例,提高视频物体探测的准确性和速度。为此,我们在图像网VID和小型YouTube-Boxes数据集上的结果显示了1.3点和2.7点 mAP改进,分别使用了1.6x和1.8x速度。此外,我们通过在图像网VID数据集上增加1.25x速度和略微更好的MAP,改进了最先进的视频加速工作。