Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interaction behavior amongst different vehicles. But, as we know, real driving tasks often involve very high risk scenarios where other drivers often don't behave in the expected sense. There can be many reasons for this behavior like being tired or inexperienced. The simulation environment doesn't take this information into account while training the navigation agent. Therefore, in this study we especially focus on systematically creating these risk prone scenarios with heavy traffic and unexpected random behavior for creating better model-free learning agents. We generate multiple autonomous driving scenarios by creating new custom Markov Decision Process (MDP) environment iterations in the highway-env simulation package. The behavior policy is learnt by agents trained with the help from deep reinforcement learning models. Our behavior policy is deliberated to handle collisions and risky randomized driver behavior. We train model free learning agents with supplement information of risk prone driving scenarios and compare their performance with baseline agents. Finally, we casually measure the impact of adding these perturbations in the training process to precisely account for the performance improvement obtained from utilizing the learnings from these scenarios.
翻译:模拟环境有利于抽象地学习不同的驾驶任务,如更换车道、停车或处理交叉路口等。 但是,这些模拟环境往往限制自己在保守的不同车辆之间的互动行为下运作。 但是,我们知道,真正的驾驶任务往往涉及非常高风险的情景,而其他驾驶员通常不按预期的方式行事。 这种行为有许多原因, 比如疲劳或缺乏经验。 模拟环境在培训导航代理时没有考虑到这些信息。 因此, 我们在本研究中特别侧重于系统地创造这些风险易发生情况, 包括交通量大和意外的随机行为, 以创造更好的无型学习代理。 我们通过在高速公路- env 模拟组合中创建新的自定义的Markov 决策程序( MDP) 环境循环, 产生多种自主驱动方案。 行为政策是由受过深层强化学习模型帮助的代理师所学会的。 我们的行为政策是有意处理碰撞和风险随机化驱动器行为。 我们训练免费示范学习代理, 补充易发生驾驶情况的信息, 并与基线代理比较其性能。 最后, 我们粗略测量了在培训过程中添加这些透视情景到改进过程的影响。