Connected vehicles will change the modes of future transportation management and organization, especially at an intersection without traffic light. Centralized coordination methods globally coordinate vehicles approaching the intersection from all sections by considering their states altogether. However, they need substantial computation resources since they own a centralized controller to optimize the trajectories for all approaching vehicles in real-time. In this paper, we propose a centralized coordination scheme of automated vehicles at an intersection without traffic signals using reinforcement learning (RL) to address low computation efficiency suffered by current centralized coordination methods. We first propose an RL training algorithm, model accelerated proximal policy optimization (MA-PPO), which incorporates a prior model into proximal policy optimization (PPO) algorithm to accelerate the learning process in terms of sample efficiency. Then we present the design of state, action and reward to formulate centralized coordination as an RL problem. Finally, we train a coordinate policy in a simulation setting and compare computing time and traffic efficiency with a coordination scheme based on model predictive control (MPC) method. Results show that our method spends only 1/400 of the computing time of MPC and increase the efficiency of the intersection by 4.5 times.
翻译:中央协调方法通过考虑各路段的全局来协调车辆进入交叉路口,然而,它们需要大量计算资源,因为它们拥有一个中央控制器,以便实时优化所有接近车辆的轨迹。在本文中,我们提议了一个中央协调系统,即没有交通信号的交叉路口的自动车辆,使用强化学习(RL)来解决当前中央协调方法所蒙受的低计算效率问题。我们首先提议一个RL培训算法,即快速准政策优化模型(MA-PPPO),将先前的模型纳入准政策优化算法,以加速抽样效率方面的学习进程。然后我们提出设计州、行动和奖励,将集中协调作为RL问题。最后,我们用模型预测控制方法来模拟和比较计算时间和交通效率的政策。结果显示,我们的方法只花费了MPC计算时间的1/400,并增加了交叉的效率4.5倍。