We present a reinforcement learning-based solution to autonomously race on a miniature race car platform. We show that a policy that is trained purely in simulation using a relatively simple vehicle model, including model randomization, can be successfully transferred to the real robotic setup. We achieve this by using novel policy output regularization approach and a lifted action space which enables smooth actions but still aggressive race car driving. We show that this regularized policy does outperform the Soft Actor Critic (SAC) baseline method, both in simulation and on the real car, but it is still outperformed by a Model Predictive Controller (MPC) state of the art method. The refinement of the policy with three hours of real-world interaction data allows the reinforcement learning policy to achieve lap times similar to the MPC controller while reducing track constraint violations by 50%.
翻译:我们展示了一种强化的学习型解决方案,在微型赛车平台上自主赛跑。我们展示了一种纯粹在模拟中使用相对简单的车辆模型(包括模型随机化)进行模拟培训的政策,可以成功地转移到真正的机器人装置。我们通过使用新的政策产出正规化方法和一个解除行动空间,使得能够顺利地采取行动,但仍然具有攻击性的赛车驾驶。我们展示了这种正规化政策在模拟和实车方面的确优于Soft Actor Critic(SAC)基线方法,但是它仍然比模型预测控制器(MPC)先进方法(MPC)的模型(MPC)高。 通过对政策进行三小时真实世界互动数据的完善,使得强化学习政策能够达到与MPC控制器相似的速度,同时将轨道违规率减少50%。