We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form that encodes spatial structure together with radar cross section (RCS) and Doppler attributes. A lightweight recovery step reconstructs point clouds from the generated maps. To better align generation with the visual scene, RadarGen incorporates BEV-aligned depth, semantic, and motion cues extracted from pretrained foundation models, which guide the stochastic generation process toward physically plausible radar patterns. Conditioning on images makes the approach broadly compatible, in principle, with existing visual datasets and simulation frameworks, offering a scalable direction for multimodal generative simulation. Evaluations on large-scale driving data show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data, marking a step toward unified generative simulation across sensing modalities.
翻译:本文提出RadarGen,一种基于多视角摄像头图像合成真实车载雷达点云的扩散模型。该方法通过将雷达测量值表示为鸟瞰图形式(同时编码空间结构、雷达截面积属性与多普勒属性),将高效的图像隐空间扩散技术适配至雷达领域。通过轻量级重建步骤可从生成的特征图中恢复点云。为更好地实现生成结果与视觉场景的对齐,RadarGen融合了从预训练基础模型中提取的鸟瞰图对齐深度、语义及运动线索,这些线索引导随机生成过程形成物理可信的雷达模式。基于图像的生成条件使该方法在原理上广泛兼容现有视觉数据集与仿真框架,为多模态生成式仿真提供了可扩展的研究方向。在大规模驾驶数据上的评估表明,RadarGen能够捕捉特征性雷达测量分布,并缩小与基于真实数据训练的感知模型之间的性能差距,标志着跨传感模态统一生成式仿真研究迈出了重要一步。