This paper explores the use of reinforcement learning (RL) models for autonomous racing. In contrast to passenger cars, where safety is the top priority, a racing car aims to minimize the lap-time. We frame the problem as a reinforcement learning task with a multidimensional input consisting of the vehicle telemetry, and a continuous action space. To find out which RL methods better solve the problem and whether the obtained models generalize to driving on unknown tracks, we put 10 variants of deep deterministic policy gradient (DDPG) to race in two experiments: i)~studying how RL methods learn to drive a racing car and ii)~studying how the learning scenario influences the capability of the models to generalize. Our studies show that models trained with RL are not only able to drive faster than the baseline open source handcrafted bots but also generalize to unknown tracks.
翻译:本文探讨了自动赛车使用强化学习模式( RL) 的问题。 与乘客汽车相比, 安全是最重要的优先事项, 赛车的目的是最大限度地减少大腿时间。 我们将此问题描述为一个强化学习任务, 包含由车辆遥测和连续行动空间组成的多维输入。 要找出哪一种RL方法更好地解决问题, 以及所获得的模型是否在未知轨道上通用驾驶, 我们将10种深度确定性政策梯度( DDPG) 的变种在两个实验中进行比赛 : i) ~ 研究RL 如何学会驾驶赛车, ii) ~ 研究学习场景如何影响模型的普及能力。 我们的研究显示, 接受RL 培训的模型不仅能够比基线开源手工制作的机器人更快的驱动速度, 而且还将未知的轨迹概括化 。