Traffic congestion in urban road networks leads to longer trip times and higher emissions, especially during peak periods. While the Shortest Path First (SPF) algorithm is optimal for a single vehicle in a static network, it performs poorly in dynamic, multi-vehicle settings, often worsening congestion by routing all vehicles along identical paths. We address dynamic vehicle routing through a multi-agent reinforcement learning (MARL) framework for coordinated, network-aware fleet navigation. We first propose Adaptive Navigation (AN), a decentralized MARL model where each intersection agent provides routing guidance based on (i) local traffic and (ii) neighborhood state modeled using Graph Attention Networks (GAT). To improve scalability in large networks, we further propose Hierarchical Hub-based Adaptive Navigation (HHAN), an extension of AN that assigns agents only to key intersections (hubs). Vehicles are routed hub-to-hub under agent control, while SPF handles micro-routing within each hub region. For hub coordination, HHAN adopts centralized training with decentralized execution (CTDE) under the Attentive Q-Mixing (A-QMIX) framework, which aggregates asynchronous vehicle decisions via attention. Hub agents use flow-aware state features that combine local congestion and predictive dynamics for proactive routing. Experiments on synthetic grids and real urban maps (Toronto, Manhattan) show that AN reduces average travel time versus SPF and learning baselines, maintaining 100% routing success. HHAN scales to networks with hundreds of intersections, achieving up to 15.9% improvement under heavy traffic. These findings highlight the potential of network-constrained MARL for scalable, coordinated, and congestion-aware routing in intelligent transportation systems.
翻译:城市道路网络中的交通拥堵导致行程时间延长和排放增加,尤其在高峰时段更为显著。最短路径优先(SPF)算法在静态网络中对于单一车辆是最优的,但在动态多车辆场景中表现不佳,常因将所有车辆引导至相同路径而加剧拥堵。本研究通过多智能体强化学习(MARL)框架解决动态车辆路径规划问题,实现协调且网络感知的车队导航。我们首先提出自适应导航(AN)模型,这是一种去中心化的MARL方法,其中每个交叉口智能体基于(i)局部交通状况和(ii)通过图注意力网络(GAT)建模的邻域状态提供路径引导。为提升大型网络的可扩展性,我们进一步提出基于分层枢纽的自适应导航(HHAN),作为AN的扩展方案,仅将智能体部署于关键交叉口(枢纽)。车辆在智能体控制下进行枢纽间路由,而枢纽区域内微路径则由SPF处理。为实现枢纽协调,HHAN在注意力Q混合(A-QMIX)框架下采用集中训练与分散执行(CTDE)机制,通过注意力机制聚合异步车辆决策。枢纽智能体采用融合局部拥堵与预测动态的流量感知状态特征,以支持主动式路径规划。在合成网格和真实城市地图(多伦多、曼哈顿)上的实验表明,AN相较于SPF及学习基线方法降低了平均行程时间,并保持100%的路径规划成功率。HHAN可扩展至数百个交叉口的网络,在重度交通条件下实现最高15.9%的性能提升。这些发现凸显了网络约束MARL在智能交通系统中实现可扩展、协调且拥堵感知路径规划的潜力。