World models learn behaviors in a latent imagination space to enhance the sample-efficiency of deep reinforcement learning (RL) algorithms. While learning world models for high-dimensional observations (e.g., pixel inputs) has become practicable on standard RL benchmarks and some games, their effectiveness in real-world robotics applications has not been explored. In this paper, we investigate how such agents generalize to real-world autonomous vehicle control tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with a high-dimensional LiDAR sensor, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the choice of their observation model. We provide extensive empirical evidence for the effectiveness of world models provided with long enough memory horizons in sim2real tasks.
翻译:世界模型在潜在的想象空间里学习行为,以提高深强化学习算法的样本效率。虽然在标准RL基准和一些游戏中学习高维观测的世界模型(例如像素投入)已经变得可行,但在现实世界机器人应用中,这些模型的有效性尚未探索。在本文中,我们调查这些代理商如何将模型推广到现实世界自主车辆控制任务,而先进的无模型深度RL算法却失败了。特别是,我们为一个装有高维LIDAR传感器的F1TENTH赛车机器人在一组测试轨道上设置了一系列时间间隔任务,这些机器人的复杂程度逐渐增加。在这种持续控制环境下,我们展示出,基于模型的代理商能够想象到在性能、样本效率、成功完成任务和一般化方面大大超出无型号的模型。此外,我们还表明,基于模型的代理商的普及能力在很大程度上取决于其观测模型的选择模式。我们提供了广泛的实证证据,说明在Sim2real任务中具有足够记忆度的世界模型的有效性。