Reinforcement learning for training end-to-end autonomous driving models in closed-loop simulations is gaining growing attention. However, most simulation environments differ significantly from real-world conditions, creating a substantial simulation-to-reality (sim2real) gap. To bridge this gap, some approaches utilize scene reconstruction techniques to create photorealistic environments as a simulator. While this improves realistic sensor simulation, these methods are inherently constrained by the distribution of the training data, making it difficult to render high-quality sensor data for novel trajectories or corner case scenarios. Therefore, we propose ReconDreamer-RL, a framework designed to integrate video diffusion priors into scene reconstruction to aid reinforcement learning, thereby enhancing end-to-end autonomous driving training. Specifically, in ReconDreamer-RL, we introduce ReconSimulator, which combines the video diffusion prior for appearance modeling and incorporates a kinematic model for physical modeling, thereby reconstructing driving scenarios from real-world data. This narrows the sim2real gap for closed-loop evaluation and reinforcement learning. To cover more corner-case scenarios, we introduce the Dynamic Adversary Agent (DAA), which adjusts the trajectories of surrounding vehicles relative to the ego vehicle, autonomously generating corner-case traffic scenarios (e.g., cut-in). Finally, the Cousin Trajectory Generator (CTG) is proposed to address the issue of training data distribution, which is often biased toward simple straight-line movements. Experiments show that ReconDreamer-RL improves end-to-end autonomous driving training, outperforming imitation learning methods with a 5x reduction in the Collision Ratio.
翻译:在闭环仿真中训练端到端自动驾驶模型的强化学习正受到越来越多的关注。然而,大多数仿真环境与现实条件存在显著差异,造成了巨大的仿真到现实差距。为弥合此差距,一些方法利用场景重建技术来创建逼真的环境作为仿真器。虽然这提升了传感器仿真的真实感,但这些方法本质上受限于训练数据的分布,难以针对新轨迹或极端案例场景渲染高质量的传感器数据。因此,我们提出了ReconDreamer-RL,一个旨在将视频扩散先验集成到场景重建中以辅助强化学习的框架,从而增强端到端自动驾驶训练。具体而言,在ReconDreamer-RL中,我们引入了ReconSimulator,它结合了用于外观建模的视频扩散先验,并融入了用于物理建模的运动学模型,从而从真实世界数据中重建驾驶场景。这缩小了闭环评估和强化学习的仿真到现实差距。为覆盖更多极端案例场景,我们引入了动态对抗智能体,它根据主车调整周围车辆的轨迹,自主生成极端案例交通场景。最后,我们提出了Cousin轨迹生成器,以解决训练数据分布通常偏向于简单直线运动的问题。实验表明,ReconDreamer-RL改善了端到端自动驾驶训练,在碰撞率降低5倍的情况下优于模仿学习方法。