4D millimeter-wave (mmWave) radar has been widely adopted in autonomous driving and robot perception due to its low cost and all-weather robustness. However, its inherent sparsity and limited semantic richness significantly constrain perception capability. Recently, fusing camera data with 4D radar has emerged as a promising cost effective solution, by exploiting the complementary strengths of the two modalities. Nevertheless, point-cloud-based radar often suffer from information loss introduced by multi-stage signal processing, while directly utilizing raw 4D radar data incurs prohibitive computational costs. To address these challenges, we propose WRCFormer, a novel 3D object detection framework that fuses raw radar cubes with camera inputs via multi-view representations of the decoupled radar cube. Specifically, we design a Wavelet Attention Module as the basic module of wavelet-based Feature Pyramid Network (FPN) to enhance the representation of sparse radar signals and image data. We further introduce a two-stage query-based, modality-agnostic fusion mechanism termed Geometry-guided Progressive Fusion to efficiently integrate multi-view features from both modalities. Extensive experiments demonstrate that WRCFormer achieves state-of-the-art performance on the K-Radar benchmarks, surpassing the best model by approximately 2.4% in all scenarios and 1.6% in the sleet scenario, highlighting its robustness under adverse weather conditions.
翻译:4D毫米波雷达因其低成本与全天候鲁棒性,已广泛应用于自动驾驶与机器人感知领域。然而,其固有的稀疏性与有限的语义丰富度显著制约了感知能力。近年来,通过融合相机数据与4D雷达以发挥两种模态的互补优势,已成为一种具有前景的高性价比解决方案。然而,基于点云的雷达数据常因多级信号处理引入信息损失,而直接使用原始4D雷达数据则会产生极高的计算成本。为应对这些挑战,我们提出WRCFormer——一种新颖的3D目标检测框架,该框架通过解耦雷达立方体的多视图表示,将原始雷达立方体数据与相机输入进行融合。具体而言,我们设计了小波注意力模块作为基于小波的特征金字塔网络的基础模块,以增强稀疏雷达信号与图像数据的表征能力。我们进一步引入一种两阶段、基于查询且与模态无关的融合机制,称为几何引导渐进融合,以高效整合来自两种模态的多视图特征。大量实验表明,WRCFormer在K-Radar基准测试中取得了最先进的性能,在所有场景下超越最佳模型约2.4%,在雨雪场景下超越约1.6%,突显了其在恶劣天气条件下的鲁棒性。