Driven by deep learning techniques, perception technology in autonomous driving has developed rapidly in recent years. To achieve accurate and robust perception capabilities, autonomous vehicles are often equipped with multiple sensors, making sensor fusion a crucial part of the perception system. Among these fused sensors, radars and cameras enable a complementary and cost-effective perception of the surrounding environment regardless of lighting and weather conditions. This review aims to provide a comprehensive guideline for radar-camera fusion, particularly concentrating on perception tasks related to object detection and semantic segmentation. Based on the principles of the radar and camera sensors, we delve into the data processing process and representations, followed by an in-depth analysis and summary of radar-camera fusion datasets. In the review of methodologies in radar-camera fusion, we address interrogative questions, including "why to fuse", "what to fuse", "where to fuse", "when to fuse", and "how to fuse", subsequently discussing various challenges and potential research directions within this domain. To ease the retrieval and comparison of datasets and fusion methods, we also provide an interactive website: https://XJTLU-VEC.github.io/Radar-Camera-Fusion.
翻译:受深度学习技术的推动,自动驾驶中的感知技术近年来得到了快速发展。为了实现准确和稳健的感知能力,自动驾驶车辆通常装备了多个传感器,从而使传感器融合成为感知系统的关键部分。在这些融合的传感器中,雷达和摄像头能够提供互补和成本效益的周围环境感知,不受照明和天气条件的影响。本次综述旨在提供雷达-摄像头融合的全面指南,特别是集中于目标检测和语义分割等感知任务。基于雷达和摄像头传感器的原理,我们深入探讨数据处理过程和表示,随后对雷达- 摄像头融合数据集进行了详细的分析和总结。在雷达-摄像头融合方法的回顾中,我们提出了一系列疑问,包括“为什么融合”、“融合什么”、“在哪里融合”、“何时融合”以及“如何融合”,随后讨论了该领域中的各种挑战和潜在的研究方向。为了方便检索和比较数据集和融合方法,我们还提供了一个交互式网站: https://XJTLU-VEC.github.io/Radar-Camera-Fusion。