Automated vehicles are gradually entering people's daily life to provide a comfortable driving experience for the users. The generic and user-agnostic automated vehicles have limited ability to accommodate the different driving styles of different users. This limitation not only impacts users' satisfaction but also causes safety concerns. Learning from user demonstrations can provide direct insights regarding users' driving preferences. However, it is difficult to understand a driver's preference with limited data. In this study, we use a model-free inverse reinforcement learning method to study drivers' characteristics in the car-following scenario from a naturalistic driving dataset, and show this method is capable of representing users' preferences with reward functions. In order to predict the driving styles for drivers with limited data, we apply Gaussian Mixture Models and compute the similarity of a specific driver to the clusters of drivers. We design a personalized adaptive cruise control (P-ACC) system through a partially observable Markov decision process (POMDP) model. The reward function with the model to mimic drivers' driving style is integrated, with a constraint on the relative distance to ensure driving safety. Prediction of the driving styles achieves 85.7% accuracy with the data of less than 10 car-following events. The model-based experimental driving trajectories demonstrate that the P-ACC system can provide a personalized driving experience.
翻译:自动车辆正在逐渐进入人们的日常生活,为用户提供舒适的驾驶经验。通用的和用户不可知的自动化车辆在适应不同用户的不同驾驶风格方面能力有限。这种限制不仅影响用户的满意度,而且引起安全问题。从用户演示中学习可以直接了解用户的驾驶偏好。然而,很难理解驾驶者对有限数据的偏好。在本研究中,我们使用一种不使用模型的反向强化学习方法,从自然驾驶数据集中研究汽车跟踪情景中的驾驶者特点,并显示这种方法能够代表用户对奖赏功能的偏好。为了预测数据有限的驾驶者的驾驶风格,我们采用高斯混合模型,并将特定驾驶者的类似性与驾驶者群相匹配。我们设计一个个人化的适应性巡航控制(P-ACC)系统,通过一个部分可观测的Markov决策程序(POMDP)模型。与模拟驾驶员驾驶风格的奖赏功能相结合,对确保驾驶安全的相对距离有所限制。我们采用高斯混合的驾驶风格模型可以提供85.7%的个人驾驶模型。