We investigate the problem of autonomous racing among teams of cooperative agents that are subject to realistic racing rules. Our work extends previous research on hierarchical control in head-to-head autonomous racing by considering a generalized version of the problem while maintaining the two-level hierarchical control structure. A high-level tactical planner constructs a discrete game that encodes the complex rules using simplified dynamics to produce a sequence of target waypoints. The low-level path planner uses these waypoints as a reference trajectory and computes high-resolution control inputs by solving a simplified formulation of a racing game with a simplified representation of the realistic racing rules. We explore two approaches for the low-level path planner: training a multi-agent reinforcement learning (MARL) policy and solving a linear-quadratic Nash game (LQNG) approximation. We evaluate our controllers on simple and complex tracks against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show our hierarchical methods outperform the baselines in terms of race wins, overall team performance, and compliance with the rules. Qualitatively, we observe the hierarchical controllers mimic actions performed by expert human drivers such as coordinated overtaking, defending against multiple opponents, and long-term planning for delayed advantages.
翻译:我们的工作扩展了以往关于头对头自动赛中等级控制的研究,我们研究了这一问题的普遍版本,同时保持了两级等级控制结构。高级别战术规划员构建了一个独立的游戏,用简化的动态编码复杂的规则,以产生一系列目标路标。低级别路径规划员将这些路径点用作参考轨迹,并通过解决简化的比赛模式,简化现实赛规则代表,计算出高分辨率控制投入。我们探索了低级别路径规划员的两个方法:培训多剂强化学习(MARL)政策和解决线性赤道纳什游戏(LQNG)近似。我们根据三个基线评估我们的简单和复杂轨迹:端到端的MARL控制员、跟踪固定赛线的MARL控制员和跟踪固定赛线的LQNG控制员。定量结果显示,我们的等级方法在种族胜利、总体业绩、总体团队性纳什比赛(LQNG)政策方面超越了基线,我们用专家级优势来捍卫多级规则,并遵守了多级规则。