Despite the rich theoretical foundation of model-based deep reinforcement learning (RL) agents, their effectiveness in real-world robotics-applications is less studied and understood. In this paper, we, therefore, investigate how such agents generalize to real-world autonomous-vehicle control-tasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with high-dimensional LiDAR sensors, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination, substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the observation-model choice. Finally, we provide extensive empirical evidence for the effectiveness of model-based agents provided with long enough memory horizons in sim2real tasks.
翻译:尽管基于模型的深层强化学习(RL)剂具有丰富的理论基础,但它们在现实世界机器人应用方面的效力却很少被研究和理解。因此,在本文中,我们调查这些剂如何概括到现实世界的自主车辆控制任务,在这种任务中,先进的无模型的深层RL算法失败。特别是,我们为F1TH赛车机器人在一组测试轨道上设置了一系列时间间隔任务,配备高维激光雷达传感器,其复杂性逐渐增加。在这个持续控制环境中,我们展示了那些基于模型的代理人,能够在想象力方面学习,在性能、抽样效率、成功完成任务和一般化方面大大优于不使用模型的代理人。此外,我们还表明,基于模型的代理人的普及能力在很大程度上取决于观察模式的选择。最后,我们为模型的代理人提供了广泛的经验证据,说明在Sim2真实任务中具有足够记忆的长的视野。