Autonomous Intersection Management (AIM) provides a signal-free intersection scheduling paradigm for Connected Autonomous Vehicles (CAVs). Distributed learning method has emerged as an attractive branch of AIM research. Compared with centralized AIM, distributed AIM can be deployed to CAVs at a lower cost, and compared with rule-based and optimization-based method, learning-based method can treat various complicated real-time intersection scenarios more flexibly. Deep reinforcement learning (DRL) is the mainstream approach in distributed learning to address AIM problems. However, the large-scale simultaneous interactive decision of multiple agents and the rapid changes of environment caused by interactions pose challenges for DRL, making its reward curve oscillating and hard to converge, and ultimately leading to a compromise in safety and computing efficiency. For this, we propose a non-RL learning framework, called Distributed Hierarchical Adversarial Learning (D-HAL). The framework includes an actor network that generates the actions of each CAV at each step. The immediate discriminator evaluates the interaction performance of the actor network at the current step, while the final discriminator makes the final evaluation of the overall trajectory from a series of interactions. In this framework, the long-term outcome of the behavior no longer motivates the actor network in terms of discounted rewards, but rather through a designed adversarial loss function with discriminative labels. The proposed model is evaluated at a four-way-six-lane intersection, and outperforms several state-of-the-art methods on ensuring safety and reducing travel time.
翻译:自主跨部门管理(AIM)为连接自治车辆(CAVs)提供了一个无信号的交接时间安排模式。 分布式学习方法已成为AIM研究的一个有吸引力的分支。 与集中式AIM相比,分布式AIM可以以较低的成本部署到CAVs, 与基于规则的和基于优化的方法相比,基于学习的方法可以更灵活地处理各种复杂的实时交叉情景。深层强化学习(DRL)是分配式学习以解决AIM问题的主流方法。然而,多个代理商的大规模同时互动决定和互动引起的环境快速变化给DRL带来了挑战,使其奖励曲线波动和难以汇合,最终导致安全和计算效率方面的妥协。 为此,我们提出了一个非RL学习框架,称为分布式分立式的分立式反向交叉学习(D-HAL) 。 该框架包括一个行为者网络,在每一步骤中生成提议的行动。 直接歧视者评估了当前一步的行为者网络互动性表现,使DRLL(L)系统最后的曲线曲线曲线曲线曲线和最后周期周期周期性评估,而不是最后的周期性结果。</s>