We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unprecedented capabilities in LiDAR generative modeling: (i) LiDAR generation guided by driving scenarios, offering significant potential for autonomous driving simulations, and (ii) 4D LiDAR point cloud generation, enabling the creation of realistic and temporally coherent sequences. At the heart of our model is a novel integrated 4D world generation framework. Specifically, we employ latent diffusion models to generate the 3D scene, combine it with dynamic actors to form the underlying 4D world, and subsequently produce realistic sensory observations within this virtual environment. Our experiments indicate that our approach outperforms competing algorithms in realism, temporal coherency, and layout consistency. We additionally show that LidarDM can be used as a generative world model simulator for training and testing perception models.
翻译:本文提出LidarDM,一种新型激光雷达生成模型,能够生成具有真实感、布局感知性、物理合理性和时间连贯性的激光雷达视频。LidarDM在激光雷达生成建模领域具备两项前所未有的能力:(i)支持驾驶场景引导的激光雷达生成,为自动驾驶仿真提供重要潜力;(ii)实现4D激光雷达点云生成,能够创建真实且时间连贯的序列。我们模型的核心是一个新颖的集成化4D世界生成框架。具体而言,我们采用潜在扩散模型生成3D场景,将其与动态参与者结合构成底层4D世界,随后在此虚拟环境中生成真实的传感器观测数据。实验表明,我们的方法在真实感、时间连贯性和布局一致性方面均优于现有算法。我们进一步证明,LidarDM可作为生成式世界模型仿真器,用于感知模型的训练与测试。