Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research.
翻译:自主驾驶是复杂的,需要复杂的三维场景理解、本地化、绘图和控制。我们不是明确建模和对其中每个部件进行引信,而是考虑通过强化学习(RL)来从端到端的方法。然而,在现实世界中收集探索驱动数据是不切实际和危险的。虽然模拟和部署视觉模拟到真实技术的培训在机器人操纵方面效果良好,但部署超出控制范围的工作空间视角仍是一个挑战。在本文中,我们通过展示Sim2Seg来应对这一挑战,Sim2Seg是RCAN的重新想象,它跨越了路外自主驾驶的视觉现实差距,而没有使用任何真实世界数据。这是通过学习将随机模拟模拟模拟图像转换成模拟分解和深度地图,从而使得真实世界图像也能被翻译。这使我们能够在模拟中训练一个端到端的RL政策,并直接在现实世界中部署。我们在1 GPU上进行48小时的培训,我们的方法可以同样地进行经典的认知和控制堆叠。我们希望这项工作能够推动未来最终研究。