通过创造反逆模仿学习模拟人类驾驶行为的行为 (Modeling Human Driving Behavior through Generative Adversarial Imitation Learning)

An open problem in autonomous vehicle safety validation is building reliable models of human driving behavior in simulation. This work presents an approach to learn neural driving policies from real world driving demonstration data. We model human driving as a sequential decision making problem that is characterized by non-linearity and stochasticity, and unknown underlying cost functions. Imitation learning is an approach for generating intelligent behavior when the cost function is unknown or difficult to specify. Building upon work in inverse reinforcement learning (IRL), Generative Adversarial Imitation Learning (GAIL) aims to provide effective imitation even for problems with large or continuous state and action spaces, such as modeling human driving. This article describes the use of GAIL for learning-based driver modeling. Because driver modeling is inherently a multi-agent problem, where the interaction between agents needs to be modeled, this paper describes a parameter-sharing extension of GAIL called PS-GAIL to tackle multi-agent driver modeling. In addition, GAIL is domain agnostic, making it difficult to encode specific knowledge relevant to driving in the learning process. This paper describes Reward Augmented Imitation Learning (RAIL), which modifies the reward signal to provide domain-specific knowledge to the agent. Finally, human demonstrations are dependent upon latent factors that may not be captured by GAIL. This paper describes Burn-InfoGAIL, which allows for disentanglement of latent variability in demonstrations. Imitation learning experiments are performed using NGSIM, a real-world highway driving dataset. Experiments show that these modifications to GAIL can successfully model highway driving behavior, accurately replicating human demonstrations and generating realistic, emergent behavior in the traffic flow arising from the interaction between driving agents.

翻译：自动车辆安全验证的公开问题正在模拟中建立可靠的人类驾驶行为模型。这项工作展示了一种从真实世界驱动的演示数据中学习神经驱动政策的方法。我们将人驾驶模拟为连续决策问题, 其特点是非线性和随机性, 以及未知的基本成本功能。模拟学习是一种在成本函数未知或难以指定时产生智能行为的方法。在反强化学习( IRL) 工作的基础上, 引生性自我仿真学习( QAIL) 旨在为大型或连续状态和行动空间( 如模拟人驾驶)的问题提供有效的复制。本条描述了GAIL用于学习驱动模型的连续决策问题。由于驱动模型的本质是多剂问题, 需要建模, 本文描述了GAIL的参数共享扩展, 称为 PS- GAIL, 解决多剂驱动模型的模型模型模型。此外, GAIL( GAIL) 是一个域域, 使得难以将特定知识与学习过程( 如模拟人驾驶过程) 相关。本文描述GAIL 的深度演化过程, 演化性演化性演化性演化性演化到性演化性演化性演化性演化性性性演化演化性性演化演化性性性性性演化演化演化演化性性性演化性性性性性性性性性性演化演化演化演化演化性性演化性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性性