Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to identify driving preferences and produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we use a combination of imitation and reinforcement learning to train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision risk. To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
翻译:模拟学习(IL)是使用高质量的载人驾驶数据的一个简单而有力的方法,这些数据可以大规模收集,用于确定驾驶偏好和产生类似人的行为。然而,单靠模仿学习的政策往往不能充分说明安全和可靠性方面的关切。在本文中,我们展示了模仿学习与利用简单奖励强化学习相结合如何大大改善驾驶政策相对于单靠模仿学习的政策的安全和可靠性。特别是,我们利用模仿和强化学习相结合的方法,对超过100公里的城市驾驶数据进行政策培训,并衡量其在不同碰撞风险水平的测试情景中的有效性。 据我们所知,这是在利用大量真实世界载人驾驶数据的自主驾驶中首次采用综合模仿和强化学习方法。