We propose and validate a novel car following model based on deep reinforcement learning. Our model is trained to maximize externally given reward functions for the free and car-following regimes rather than reproducing existing follower trajectories. The parameters of these reward functions such as desired speed, time gap, or accelerations resemble that of traditional models such as the Intelligent Driver Model (IDM) and allow for explicitly implementing different driving styles. Moreover, they partially lift the black-box nature of conventional neural network models. The model is trained on leading speed profiles governed by a truncated Ornstein-Uhlenbeck process reflecting a realistic leader's kinematics. This allows for arbitrary driving situations and an infinite supply of training data. For various parameterizations of the reward functions, and for a wide variety of artificial and real leader data, the model turned out to be unconditionally string stable, comfortable, and crash-free. String stability has been tested with a platoon of five followers following an artificial and a real leading trajectory. A cross-comparison with the IDM calibrated to the goodness-of-fit of the relative gaps showed a higher reward compared to the traditional model and a better goodness-of-fit.
翻译:我们提出并验证了一部基于深层强化学习的新型汽车。我们的模型经过培训,以最大限度地扩大自由和汽车追随制度外部奖励功能,而不是复制现有的追随者轨迹。这些奖励功能的参数,如理想速度、时间差距或加速率,类似于像智能驱动模型(IDM)等传统模型的参数,并允许明确采用不同的驾驶风格。此外,它们部分提升常规神经网络模型的黑箱性质。该模型的训练是使用由快速的 Ornstein-Uhlenbeck 进程管理的领先速度剖面图,反映现实的领导人的动态学。这允许任意驾驶情形和无限的培训数据供应。对于各种奖励功能的参数化,以及各种人工和真实的领导数据,该模型最终无条件稳定、舒适和无碰撞。在人工和真正领先轨迹之后,固定的稳定性已经用5个追随者排进行了测试。与IDM的交叉校准,以更适合相对良好品质的模型相比,显示了更高的奖状。