Bus bunching is a natural-occurring phenomenon that undermines the efficiency and stability of the public transportation system. The mainstream solutions control the bus to intentionally stay longer at certain stations. Existing control methods include conventional methods that provide a formula to calculate the control time and reinforcement learning (RL) methods that determine the control policy through repeated interactions with the system. In this paper, we propose an integrated proximal policy optimization model with dual-headway (IPPO-DH). IPPO-DH integrates the conventional headway control with reinforcement learning, so that it acquires the advantages of both algorithms -- it is more efficient in normal environments and more stable in harsh ones. To demonstrate such an advantage, we design a bus simulation environment and compare IPPO-DH with RL and several conventional methods. The results show that the proposed model maintains the application value of the conventional method by avoiding the instability of the RL method in certain environments, and improves the efficiency compared with the conventional control, shedding new light on real-world bus transit system optimization.
翻译:公共汽车捆绑是一种自然引发的现象,会破坏公共交通系统的效率和稳定性。主流解决方案控制公共汽车,故意在某些站点停留更长的时间。现有的控制方法包括提供公式的常规方法,以计算控制时间和强化学习方法,通过与系统反复互动确定控制政策。在本文中,我们提出了一个与双高速公路(IPPO-DH)相结合的综合最佳政策优化模式。IPPO-DH将常规进度控制与强化学习结合起来,从而获得两种算法的优势 -- -- 在正常环境中更高效,在苛刻环境中更稳定。为了展示这种优势,我们设计了公共汽车模拟环境,并将IPPO-DH与RL和若干常规方法进行比较。结果显示,拟议模型保持了常规方法的应用价值,避免了某些环境中RL方法的不稳定性,提高了常规控制的效率,在现实世界公共汽车过境系统优化上铺设了新的灯光。