Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets contain images from a ground level viewpoint, whereas an ideal view for intersection observation would be elevated higher above the road surface. We develop an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training. To fine-tune the monocular 3D object detector, our method utilises multiple 2D detections from overlapping, wide-baseline views and a loss that encodes the subjacent geometric consistency. Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets. We present our training methodology, multi-view reprojection loss, and dataset.
翻译:在路口对车辆进行准确的7DoF预测是评估道路使用者之间潜在冲突的一项重要任务。原则上,可以通过一个单一的摄像系统来做到这一点,该系统能够探测到每辆车的外形,但需要有一个大型的、准确的标签数据集来训练探测器。虽然有大型的车辆构成数据集(为自主车辆进行快速开发),但我们发现关于这些数据集的培训不够充分。这些数据集包含地平面图像,而交叉观察的理想视图则会提升到公路表面以上。我们开发了一种替代方法,使用一种微弱的监控方法,对交通观察摄像机的3D物体探测器进行微调;在过程中显示现有大型自主车辆数据集可用于预先训练。为微调单望远镜3D物体探测器,我们的方法利用重叠的、宽基线视图的多维探测器进行多次2D探测,并造成亚分数几何一致性。我们的方法使7DoF车的精确性预测数据集与自动车辆数据数据集的顶部运行的3D物体探测器相近。我们目前的再培训方法。