We develop a hierarchical controller for head-to-head autonomous racing. We first introduce a formulation of a racing game with realistic safety and fairness rules. A high-level planner approximates the original formulation as a discrete game with simplified state, control, and dynamics to easily encode the complex safety and fairness rules and calculates a series of target waypoints. The low-level controller takes the resulting waypoints as a reference trajectory and computes high-resolution control inputs by solving an alternative formulation with simplified objectives and constraints. We consider two approaches for the low-level planner, constructing two hierarchical controllers. One approach uses multi-agent reinforcement learning (MARL), and the other solves a linear-quadratic Nash game (LQNG) to produce control inputs. The controllers are compared against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show that the proposed hierarchical methods outperform their respective baseline methods in terms of head-to-head race wins and abiding by the rules. The hierarchical controller using MARL for low-level control consistently outperformed all other methods by winning over 88% of head-to-head races and more consistently adhered to the complex racing rules. Qualitatively, we observe the proposed controllers mimicking actions performed by expert human drivers such as shielding/blocking, overtaking, and long-term planning for delayed advantages. We show that hierarchical planning for game-theoretic reasoning produces competitive behavior even when challenged with complex rules and constraints.
翻译:我们为头对头自动赛开发一个等级控制器。 我们首先为头对头自动赛跑开发一个配对游戏的配方, 并采用现实的安全和公平规则。 一个高层次计划器将最初的配方当作一个分立游戏, 以简化状态、 控制和动态的形式将复杂的安全和公平规则编码, 并计算一系列目标路标点。 低层次控制器将由此产生的路标作为参考轨迹, 通过解决具有简化目标和限制的替代配方来计算高分辨率控制投入。 我们考虑对低层次计划器采用两种方法, 建造两个等级控制器。 一种方法使用多剂强化学习( MARL), 而另一种方法则用直线性平方纳什游戏(LQNG) 来解决线性平方- 规则。 控制器对照三个基线进行比较: 一端对端对端 MARL 规则的调控控线, 一个低层次控制器, 以及一个低层次控制器, 提议等级方法超越了各自的基准方法, 。 等级方法比先头对头对头和头竞争规则进行更激烈的比对等规则 。 等级控制 等级控制, 等级控制, 高级控制, 以持续地持续地进行, 持续地进行 持续地进行压压压压压压压压压压压压压压压压压压压 。