Reinforcement learning (RL) has shown to be a valuable tool in training neural networks for autonomous motion planning. The application of RL to a specific problem is dependent on a reward signal to quantify how good or bad a certain action is. This paper addresses the problem of reward signal design for robotic control in the context of local planning for autonomous racing. We aim to design reward signals that are able to perform well in multiple, competing, continuous metrics. Three different methodologies of position-based, velocity-based, and action-based rewards are considered and evaluated in the context of F1/10th racing. A novel method of rewarding the agent on its state relative to an optimal trajectory is presented. Agents are trained and tested in simulation and the behaviors generated by the reward signals are compared to each other on the basis of average lap time and completion rate. The results indicate that a reward based on the distance and velocity relative to a minimum curvature trajectory produces the fastest lap times.
翻译:强化学习(RL)被证明是培训神经网络进行自主运动规划的宝贵工具,在特定问题上应用RL取决于一个奖励信号,以量化某种行动的好坏。本文件从地方自主赛规划的角度处理机器人控制的奖励信号设计问题。我们的目标是设计能够以多种、竞争和连续的衡量标准很好地发挥作用的奖励信号。三种基于位置、速度和基于行动的奖励方法在F1/10赛跑的背景下得到考虑和评估。一种根据最佳轨迹对代理人的状态进行奖励的新方法被提出来。在模拟中进行培训和测试,奖励信号产生的行为根据平均步行时间和完成率相互比较。结果显示,根据距离和速度相对于最低限度曲线轨迹的距离和速度,奖励产生最快的速度。