Vision is one of the primary sensing modalities in autonomous driving. In this paper we look at the problem of estimating the velocity of road vehicles from a camera mounted on a moving car. Contrary to prior methods that train end-to-end deep networks that estimate the vehicles' velocity from the video pixels, we propose a two-step approach where first an off-the-shelf tracker is used to extract vehicle bounding boxes and then a small neural network is used to regress the vehicle velocity from the tracked bounding boxes. Surprisingly, we find that this still achieves state-of-the-art estimation performance with the significant benefit of separating perception from dynamics estimation via a clean, interpretable and verifiable interface which allows us distill the statistics which are crucial for velocity estimation. We show that the latter can be used to easily generate synthetic training data in the space of bounding boxes and use this to improve the performance of our method further.
翻译:视觉是自主驾驶的主要感知方式之一。 在本文中,我们审视了从安装在移动汽车上的相机中估计公路车辆速度的问题。 与以往训练端到端深网络的方法相反,我们建议采取两步方法,首先用现成追踪器提取车辆装束箱,然后用一个小型神经网络将车辆速度从履带装束箱中倒退出来。 令人惊讶的是,我们发现这仍然达到了最先进的估计性能,通过一个清洁、可解释和可核查的界面,将感知与动态估计区分开来,从而使我们能够提取对速度估计至关重要的统计数据。 我们表明,后者可以用来很容易地生成捆绑箱空间的合成培训数据,并用来进一步改进我们方法的性能。