Fusing the camera and LiDAR information has become a de-facto standard for 3D object detection tasks. Current methods rely on point clouds from the LiDAR sensor as queries to leverage the feature from the image space. However, people discovered that this underlying assumption makes the current fusion framework infeasible to produce any prediction when there is a LiDAR malfunction, regardless of minor or major. This fundamentally limits the deployment capability to realistic autonomous driving scenarios. In contrast, we propose a surprisingly simple yet novel fusion framework, dubbed BEVFusion, whose camera stream does not depend on the input of LiDAR data, thus addressing the downside of previous methods. We empirically show that our framework surpasses the state-of-the-art methods under the normal training settings. Under the robustness training settings that simulate various LiDAR malfunctions, our framework significantly surpasses the state-of-the-art methods by 15.7% to 28.9% mAP. To the best of our knowledge, we are the first to handle realistic LiDAR malfunction and can be deployed to realistic scenarios without any post-processing procedure. The code is available at https://github.com/ADLab-AutoDrive/BEVFusion.
翻译:使用相机和LiDAR 信息已成为 3D 对象检测任务的一个脱facto 标准 。 目前的方法依靠来自 LiDAR 传感器的点云作为调用图像空间特性的查询。 然而,人们发现,这一基本假设使得当LIDAR 发生故障时,无论大小,目前的聚合框架无法产生任何预测。 这从根本上将部署能力限制在现实的自主驾驶假设中。 相反,我们提议了一个令人惊讶的简单而新颖的聚合框架,称为BEVFusion,其相机流并不依赖于LIDAR数据的输入,从而解决了先前方法的下方问题。我们从经验上表明,我们的框架超过了正常培训环境中最先进的方法。在模拟LIDAR各种故障时,我们的框架大大超过最先进的方法15.7%至28.9% AP。 据我们所知,我们首先处理现实的LDAR AR 故障,并且可以在不经过任何后处理程序的情况下被部署到现实的情景中。 我们的代码可以在 https://LAVAD/DOVA.