We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initialize the representations for the RL policy and value function from a large prior dataset of other robots navigating in other environments (at low speed), which provides a navigation-relevant representation. From here, a sample-efficient online RL method uses a single low-speed user-provided demonstration to determine the desired driving course, extracts a set of navigational checkpoints, and autonomously practices driving through these checkpoints, resetting automatically on collision or failure. Perhaps surprisingly, we find that with appropriate initialization and choice of algorithm, our system can learn to drive over a variety of racing courses with less than 20 minutes of online training. The resulting policies exhibit emergent aggressive driving skills, such as timing braking and acceleration around turns and avoiding areas which impede the robot's motion, approaching the performance of a human driver using a similar first-person interface over the course of training.
翻译:我们提出了一个系统,使一辆自主小型遥控车能够从视觉观察中依靠强化学习(RL)主动驾驶。我们的系统FastRLAP(更快的圈速)在真实世界中自主训练,不需要人工干预,也不需要任何仿真或专家演示。我们的系统集成了许多重要的组件,使这一切成为可能:我们从大量先前的机器人在其他环境下导航(低速行驶)的数据集中初始化RL策略和值函数的表示,这提供了一个与导航相关的表示。接下来,使用一次低速用户提供的演示,样本高效的在线RL方法确定所需的驾驶路线,提取一组导航检查点,并自主练习通过这些检查点驾驶,自动在碰撞或失败时进行重置。也许有些出乎意料地,我们发现,通过适当的初始化和算法选择,我们的系统可以学习在少于20分钟的在线训练时间内驾驶各种赛道。由此产生的策略表现出新兴的侵略驾驶技能,例如在拐角处的定时刹车和加速以及避免妨碍机器人行动的区域,随着训练的进行,接近使用类似的第一人称接口的人类驾驶者的表现水平。