Safely navigating through an urban environment without violating any traffic rules is a crucial performance target for reliable autonomous driving. In this paper, we present a Reinforcement Learning (RL) based methodology to DEtect and FIX (DeFIX) failures of an Imitation Learning (IL) agent by extracting infraction spots and re-constructing mini-scenarios on these infraction areas to train an RL agent for fixing the shortcomings of the IL approach. DeFIX is a continuous learning framework, where extraction of failure scenarios and training of RL agents are executed in an infinite loop. After each new policy is trained and added to the library of policies, a policy classifier method effectively decides on which policy to activate at each step during the evaluation. It is demonstrated that even with only one RL agent trained on failure scenario of an IL agent, DeFIX method is either competitive or does outperform state-of-the-art IL and RL based autonomous urban driving benchmarks. We trained and validated our approach on the most challenging map (Town05) of CARLA simulator which involves complex, realistic, and adversarial driving scenarios. The source code is publicly available at https://github.com/data-and-decision-lab/DeFIX
翻译:在不违反任何交通规则的情况下安全地穿过城市环境,这是可靠自主驾驶的一个关键性业绩目标。在本文件中,我们介绍了基于强化学习方法(RL)的可靠自主驾驶。在每项新政策经过培训并添加到政策库后,政策分类方法有效地决定了在评估的每一步中,哪一种政策启动政策,通过提取折射点和重新构建关于这些违规地区的小型假想,来培训一个RL代理,以弥补IL方法的缺陷。DeFIX是一个持续学习框架,在这个框架中,从失败情景中提取和对RL代理进行培训是无限循环执行的。在每项新政策都经过培训并添加到政策库中,政策分类方法有效地决定了在评估的每一步骤中启动哪一种政策。事实证明,即使只有一名RL代理接受了关于IL代理失败情景的培训,DFIX方法要么是竞争性的,要么是超越了基于IL和RL自主城市驾驶基准的状态。我们培训和验证了我们在最具挑战性的地图上的方法(Town05),在CARA模拟器模拟器模拟器中,其中涉及复杂、现实性和敌对性/敌对性驱动情景。