We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics in the NLP, we are able to guarantee safe and feasible trajectories related to the used model. Compared to classical reinforcement learning (RL), our approach restricts the exploration to safe trajectories, starts with a good prior performance and yields full trajectories that can be passed to a tracking lowest-level controller. We do not address the lowest-level controller in this work and assume perfect tracking of feasible trajectories. We show the superior performance of our algorithm on simulated racing tasks that include high-level decision making. The vehicle learns to efficiently overtake slower vehicles and to avoid getting overtaken by blocking faster vehicles.
翻译:我们提出了一个安全轨迹规划方法,其中与自主赛有关的战略任务是在模拟环境中学习样本效率高的。一个高级别政策,以神经网络为代表,产出一个奖励性规格,用于非线性参数模型预测控制器的成本功能范围内。我们通过将限制和车辆动向纳入NLP,能够保证与使用模型有关的安全和可行的轨迹。与典型的强化学习(RL)相比,我们的方法限制了对安全轨迹的探索,从良好的前性能开始,并产生完全的轨迹,可以传递给跟踪最低级控制器。我们不处理这项工作中的最低级控制器,而是对可行的轨迹进行完美的跟踪。我们展示了我们模拟比赛任务(包括高层次决策)的超强性能。我们学会了快速超速超速的车辆,避免因堵塞更快的车辆而过时。