城市环境最终至最后至最后自主驱动城市环境 (Generative Adversarial Imitation Learning for End-to-End Autonomous Driving on Urban Environments)

Autonomous driving is a complex task, which has been tackled since the first self-driving car ALVINN in 1989, with a supervised learning approach, or behavioral cloning (BC). In BC, a neural network is trained with state-action pairs that constitute the training set made by an expert, i.e., a human driver. However, this type of imitation learning does not take into account the temporal dependencies that might exist between actions taken in different moments of a navigation trajectory. These type of tasks are better handled by reinforcement learning (RL) algorithms, which need to define a reward function. On the other hand, more recent approaches to imitation learning, such as Generative Adversarial Imitation Learning (GAIL), can train policies without explicitly requiring to define a reward function, allowing an agent to learn by trial and error directly on a training set of expert trajectories. In this work, we propose two variations of GAIL for autonomous navigation of a vehicle in the realistic CARLA simulation environment for urban scenarios. Both of them use the same network architecture, which process high dimensional image input from three frontal cameras, and other nine continuous inputs representing the velocity, the next point from the sparse trajectory and a high-level driving command. We show that both of them are capable of imitating the expert trajectory from start to end after training ends, but the GAIL loss function that is augmented with BC outperforms the former in terms of convergence time and training stability.

翻译：自主驾驶是一项复杂的任务,自1989年第一次自我驾驶汽车ALVINN ALVIN号以来,就一直以受监督的学习方法或行为克隆(BC)来处理。在不列颠哥伦比亚,神经网络通过州-行动对口培训,构成专家,即人类驱动者的培训。然而,这种模仿学习没有考虑到在航行轨迹不同时段所采取行动之间可能存在的时间依赖性。这类任务最好通过强化学习(RL)算法来处理,这需要界定奖励功能。另一方面,较近期的模仿学习方法,如General Adversarial Limitation(GAIL),可以培训政策,而不必明确要求界定奖励功能,允许代理人直接在专家轨迹训练中通过试验和错误学习。在现实的CARLA模拟环境中,对于车辆的自主导航,我们建议采用两种不同的结构:一种是相同的网络结构,从三个前方-反向模拟摄像机的高级图像输入,而另一种则是从前-级制动式的轨道,我们从一个前-级的轨道开始,其他连续输入,从前-级的轨道显示专家的升级的轨道,从前一端,从前-级,从前-级,从前-级的升级的升级的轨道到后,从前-级,从前-级,从前-级的飞行的飞行的飞行的飞行的升级的飞行的飞行的升级的升级的飞行的飞行的飞行的升级功能是显示。