从导航到比赛:自动竞赛的奖励信号设计 (From Navigation to Racing: Reward Signal Design for Autonomous Racing)

from arxiv, 6 pages, 11 Figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

The problem of autonomous navigation is to generate a set of navigation references which when followed move the vehicle from a starting position to and end goal location while avoiding obstacles. Autonomous racing complicates the navigation problem by adding the objective of minimising the time to complete a track. Solutions aiming for a minimum time solution require that the planner is concerned with the optimality of the trajectory according to the vehicle dynamics. Neural networks, trained from experience with reinforcement learning, have shown to be effective local planners which generate navigation references to follow a global plan and avoid obstacles. We address the problem designing a reward signal which can be used to train neural network-based local planners to race in a time-efficient manner and avoid obstacles. The general challenge of reward signal design is to represent a desired behavior in an equation that can be calculated at each time step. The specific challenge of designing a reward signal for autonomous racing is to encode obstacle-free, time optimal racing trajectories in a clear signal We propose several methods of encoding ideal racing behavior based using a combination of the position and velocity of the vehicle and the actions taken by the network. The reward function candidates are expressed as equations and evaluated in the context of F1/10th autonomous racing. The results show that the best reward signal rewards velocity along, and punishes the lateral deviation from a precalculated, optimal reference trajectory.

翻译：自主导航的问题在于产生一套导航参照标准,使车辆从起始位置到最终目标位置,在避免障碍的同时避免障碍。自动赛车使导航问题复杂化,增加了将时间减少到最低限度以完成轨道的目标。最短时间解决方案要求规划者关注根据车辆动态优化轨道。根据强化学习经验培训的神经网络显示是有效的当地规划者,产生导航参考标准,以遵循全球计划,避免障碍。我们处理的问题是设计奖励信号,用来训练以神经网络为基础的当地规划者以具有时间效率的方式进行竞赛,避免障碍。奖励信号设计的总体挑战在于代表一种理想的行为,方程式可按每一步计算。为自动赛车设计奖励信号的具体挑战在于将无障碍、时间最佳赛车轨标编码成一个明确信号,我们建议采用几种方法,根据车辆位置和速度的组合以及网络采取的行动,对理想赛车行为进行校正。奖励职能候选人的表现是方程式的方程式,在最佳轨迹上显示最佳的轨迹。