以非同步的多机构加强多机构强化学习方式减少公交车群 (Reducing Bus Bunching with Asynchronous Multi-Agent Reinforcement Learning)

The bus system is a critical component of sustainable urban transportation. However, due to the significant uncertainties in passenger demand and traffic conditions, bus operation is unstable in nature and bus bunching has become a common phenomenon that undermines the reliability and efficiency of bus services. Despite recent advances in multi-agent reinforcement learning (MARL) on traffic control, little research has focused on bus fleet control due to the tricky asynchronous characteristic -- control actions only happen when a bus arrives at a bus stop and thus agents do not act simultaneously. In this study, we formulate route-level bus fleet control as an asynchronous multi-agent reinforcement learning (ASMR) problem and extend the classical actor-critic architecture to handle the asynchronous issue. Specifically, we design a novel critic network to effectively approximate the marginal contribution for other agents, in which graph attention neural network is used to conduct inductive learning for policy evaluation. The critic structure also helps the ego agent optimize its policy more efficiently. We evaluate the proposed framework on real-world bus services and actual passenger demand derived from smart card data. Our results show that the proposed model outperforms both traditional headway-based control methods and existing MARL methods.

翻译：公交系统是可持续城市交通的重要组成部分,然而,由于客运需求和交通条件的不确定性很大,公交业务的性质不稳定,公交系统已成为破坏公交服务的可靠性和效率的常见现象。尽管最近在多剂强化学习(MARL)交通控制方面有所进展,但几乎没有研究侧重于公交车队控制,原因是车的复杂性特征 -- -- 控制行动只有在公交车到达公共汽车站时才会发生,因此代理商无法同时行动。在本研究中,我们把路线一级的公交车队控制作为一种不同步的多剂强化学习(ASMR)问题,并将经典的行为者-批评结构扩大到处理公交服务的可靠性和效率。具体地说,我们设计了一个新型的批评网络,以有效接近其他代理商的边际贡献,其中将注意力神经网络用于为政策评价进行导导电学习。批评结构还有助于利于自我代理商更有效率地优化其政策。我们评估了关于现实世界公交服务和实际乘客需求的拟议框架,以及从智能卡片数据中得出的实际乘客需求。我们的结果显示,拟议的模型超越了传统的进度控制方法和现有MAR控制方法。