The recent advances in camera-based bird's eye view (BEV) representation exhibit great potential for in-vehicle 3D perception. Despite the substantial progress achieved on standard benchmarks, the robustness of BEV algorithms has not been thoroughly examined, which is critical for safe operations. To bridge this gap, we introduce RoboBEV, a comprehensive benchmark suite that encompasses eight distinct corruptions, including Bright, Dark, Fog, Snow, Motion Blur, Color Quant, Camera Crash, and Frame Lost. Based on it, we undertake extensive evaluations across a wide range of BEV-based models to understand their resilience and reliability. Our findings indicate a strong correlation between absolute performance on in-distribution and out-of-distribution datasets. Nonetheless, there are considerable variations in relative performance across different approaches. Our experiments further demonstrate that pre-training and depth-free BEV transformation has the potential to enhance out-of-distribution robustness. Additionally, utilizing long and rich temporal information largely helps with robustness. Our findings provide valuable insights for designing future BEV models that can achieve both accuracy and robustness in real-world deployments.
翻译:最近摄像头式 Bird's Eye View(BEV)表达方式的进步为车内 3D 感知带来了巨大的潜力。尽管在标准基准上取得了实质性的进展,但 BEV 算法的鲁棒性尚未得到全面的检验,这对安全驾驶至关重要。为了弥补这一差距,我们引入了 RoboBEV,这是一个全面的基准套件,涵盖了八种不同的破坏,包括亮度、暗度、雾、雪、运动模糊、颜色量化、相机崩溃和帧丢失。在此基础上,我们对广泛的基于 BEV 的模型进行了全面的评估,以了解它们的韧性和可靠性。我们的研究结果表明,绝对性能和内部分布数据和外部分布数据之间存在强烈的相关性。尽管如此,不同方法之间的相对性能存在相当大的差异。我们的实验进一步证明,预训练和无深度 BEV 变换有潜力提高分布外的鲁棒性。此外,利用长时间和充分的时间信息可以大大提高鲁棒性。我们的研究结果为设计未来的 BEV 模型提供了有价值的洞见,这些模型能够在实际部署中实现准确性和鲁棒性的结合。