We develop a hierarchical controller for head-to-head autonomous racing. We first introduce a formulation of a racing game with realistic safety and fairness rules. A high-level planner approximates the original formulation as a discrete game with simplified state, control, and dynamics to easily encode the complex safety and fairness rules and calculates a series of target waypoints. The low-level controller takes the resulting waypoints as a reference trajectory and computes high-resolution control inputs by solving an alternative formulation approximation with simplified objectives and constraints. We consider two approaches for the low-level planner, constructing two hierarchical controllers. One approach uses multi-agent reinforcement learning (MARL), and the other solves a linear-quadratic Nash game (LQNG) to produce control inputs. The controllers are compared against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show that the proposed hierarchical methods outperform their respective baseline methods in terms of head-to-head race wins and abiding by the rules. The hierarchical controller using MARL for low-level control consistently outperformed all other methods by winning over 90% of head-to-head races and more consistently adhered to the complex racing rules. Qualitatively, we observe the proposed controllers mimicking actions performed by expert human drivers such as shielding/blocking, overtaking, and long-term planning for delayed advantages. We show that hierarchical planning for game-theoretic reasoning produces competitive behavior even when challenged with complex rules and constraints.
翻译:我们为头对头自动赛开发一个等级控制器。 我们首先为头对头自动赛跑开发一个配对游戏的配方, 并采用现实的安全和公平规则。 一个高层次计划器将最初的配方作为分解的游戏, 以简化状态、 控制和动态将原始配方作为分解的游戏, 以方便地编码复杂的安全和公平规则, 并计算一系列目标路标点。 低层次控制器将由此产生的路标作为参考轨迹, 并通过解决具有简化目标和限制的替代配方近似计算高分辨率控制输入。 我们考虑低层次规划器的两个方法, 建造两个等级控制器。 一种是多剂强化学习( MARL), 另一种是解决直线性- 夸德纳什游戏( LQNG), 以简化的分级规划器 。 将控制器比三个基线: 末端对尾 MARL 控制器, 跟踪固定的赛跑线, LQNG控制器跟踪固定赛线。 定量结果显示, 拟议的等级方法在头对头对头对头对头和头比赛的基线方法的比对赢, 规则的比对规则, 。 高级控制, 持续地, 持续地, 持续地进行 以 递压压压压压压压,, 持续地 以压压压压压压压 以 以 以 以 的 的 压压到 以 的 的 的 压到 压到 压 压到 压到 压 压到 。</s>