Compared with object detection in static images, object detection in videos is more challenging due to degraded image qualities. An effective way to address this problem is to exploit temporal contexts by linking the same object across video to form tubelets and aggregating classification scores in the tubelets. In this paper, we focus on obtaining high quality object linking results for better classification. Unlike previous methods that link objects by checking boxes between neighboring frames, we propose to link in the same frame. To achieve this goal, we extend prior methods in following aspects: (1) a cuboid proposal network that extracts spatio-temporal candidate cuboids which bound the movement of objects; (2) a short tubelet detection network that detects short tubelets in short video segments; (3) a short tubelet linking algorithm that links temporally-overlapping short tubelets to form long tubelets. Experiments on the ImageNet VID dataset show that our method outperforms both the static image detector and the previous state of the art. In particular, our method improves results by 8.8% over the static image detector for fast moving objects.
翻译:与静态图像中的天体探测相比,由于图像质量的退化,视频中的天体探测更具挑战性。解决这一问题的有效途径是利用时间环境,将视频中的同一对象连接到形成管子和集成管子中的分类分数。在本文件中,我们侧重于获得高质量的天体连接结果,以更好地分类。与以往通过在相邻框之间检查框连接天体的方法不同,我们提议在同一个框中连接。为了实现这一目标,我们推广了先前的方法如下方面:(1) 幼虫建议网络,以提取连接物体移动的时空候选幼虫;(2) 短管探测网络,在短视频段中探测短管子;(3) 短管检测算法,将短超时短管连接到形成长管子。在图像网VID数据集上进行的实验显示,我们的方法超越了静态图像探测器和以往的艺术状态。特别是,我们的方法在快速移动物体的静态图像探测器上提高了8.8%的结果。