In this work, we present a rigorous end-to-end control strategy for autonomous vehicles aimed at minimizing lap times in a time attack racing event. We also introduce AutoRACE Simulator developed as a part of this research project, which was employed to simulate accurate vehicular and environmental dynamics along with realistic audio-visual effects. We adopted a hybrid imitation-reinforcement learning architecture and crafted a novel reward function to train a deep neural network policy to drive (using imitation learning) and race (using reinforcement learning) a car autonomously in less than 20 hours. Deployment results were reported as a direct comparison of 10 autonomous laps against 100 manual laps by 10 different human players. The autonomous agent not only exhibited superior performance by gaining 0.96 seconds over the best manual lap, but it also dominated the human players by 1.46 seconds with regard to the mean lap time. This dominance could be justified in terms of better trajectory optimization and lower reaction time of the autonomous agent.
翻译:在这项工作中,我们提出了一个严格的自动车辆端对端控制战略,目的是在时间攻击赛事中最大限度地减少行驶时间;我们还引入了AutorACE模拟器,这是作为该研究项目的一部分开发的,用于模拟准确的车辆和环境动态以及现实的视听效应。我们采用了混合的模仿-加强学习架构,并设计了一个创新的奖励功能,以训练深神经网络政策,在不到20小时内自主驾驶(模仿学习)和竞赛(使用强化学习)汽车。据报告,部署结果是10个自主行与10个不同人类球员100个手动行之间的直接比较。自治代理器不仅表现优异,在最佳手动行距上获得0.96秒,而且还在平均行距方面以1.46秒的速度主宰了人类玩家。这一优势可以用更好的轨迹优化和降低自主代理器的反应时间来证明。