The use of cameras for vehicle speed measurement is much more cost effective compared to other technologies such as inductive loops, radar or laser. However, accurate speed measurement remains a challenge due to the inherent limitations of cameras to provide accurate range estimates. In addition, classical vision-based methods are very sensitive to extrinsic calibration between the camera and the road. In this context, the use of data-driven approaches appears as an interesting alternative. However, data collection requires a complex and costly setup to record videos under real traffic conditions from the camera synchronized with a high-precision speed sensor to generate the ground truth speed values. It has recently been demonstrated that the use of driving simulators (e.g., CARLA) can serve as a robust alternative for generating large synthetic datasets to enable the application of deep learning techniques for vehicle speed estimation for a single camera. In this paper, we study the same problem using multiple cameras in different virtual locations and with different extrinsic parameters. We address the question of whether complex 3D-CNN architectures are capable of implicitly learning view-invariant speeds using a single model, or whether view-specific models are more appropriate. The results are very promising as they show that a single model with data from multiple views reports even better accuracy than camera-specific models, paving the way towards a view-invariant vehicle speed measurement system.
翻译:与感应环、雷达或激光等其他技术相比,使用照相机测量车辆速度的成本效益要高得多,但准确速度测量仍是一个挑战,因为照相机具有内在局限性,无法提供准确的射程估计。此外,古典视觉方法对于照相机和道路之间的外部校准非常敏感。在这方面,使用数据驱动方法似乎是一种有趣的替代办法。然而,数据收集需要复杂和昂贵的设置,以便记录摄影机在真实交通条件下拍摄的视频,与高精度速度传感器同步,生成地面真实速度值。最近,已经证明使用驾驶模拟器(如CARLA)可以作为一种强有力的替代方法,用于生成大型合成数据集,以便能够对一台照相机和道路进行深度校准。在本文中,我们研究同样的问题,在不同虚拟地点使用多个照相机,并使用不同的外观参数。我们探讨复杂的3D-CNN结构是否更能够用单一的模型,以隐含的可视速度来学习视觉可视性速度,甚至能够用一个有希望的摄影机具体度的模型,或者用一个更精确的模型来显示一个更精确的图像模型,从而显示一个更有希望的特定模型。