While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. The code is available at {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.
翻译:虽然最近的自动驾驶系统主要致力于开发自车传感器的感知方法,但人们往往忽略了利用智能路侧摄像机来扩展视觉范围的另一种方法。我们发现,基于最先进的以视觉为中心的鸟瞰检测方法在路侧摄像机上的性能要差。这是因为这些方法主要集中在恢复相机中心处的深度,而随着距离的增加,汽车和地面之间的深度差迅速缩小。在本文中,我们提出了一种简单而有效的方法,称为 BEVHeight,在此解决此问题。本质上,我们通过回归地面高度,而不是预测像素级深度,实现了一种与距离无关的表达方式,以简化仅限于摄像机的感知方法的优化过程。在流行的路侧摄像机三维检测基准测试中,我们的方法明显优于所有以视觉为中心的以前方法。代码可在 {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}。