通过利用风险风险驱动行为知识,建立更安全自主的代理机构 (Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge)

Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interactions behavior amongst different vehicles. But, as we know that the real driving tasks often involves very high risk scenarios where other drivers often don't behave in the expected sense. There can be many reasons for this behavior like being tired or inexperienced. The simulation environments doesn't take this information into account while training the navigation agent. Therefore, in this study we especially focus on systematically creating these risk prone scenarios with heavy traffic and unexpected random behavior for creating better model-free learning agents. We generate multiple autonomous driving scenarios by creating new custom Markov Decision Process (MDP) environment iterations in highway-env simulation package. The behavior policy is learnt by agents trained with the help from deep reinforcement learning models. Our behavior policy is deliberated to handle collisions and risky randomized driver behavior. We train model free learning agents with supplement information of risk prone driving scenarios and compare their performance with baseline agents. Finally, we casually measure the impact of adding these perturbations in the training process to precisely account for the performance improvement attained from utilizing the learnings from these scenarios.

翻译：模拟环境有利于抽象地学习不同的驾驶任务,如更换车道、停车或处理交叉路口等。但是,这些模拟环境往往限制自己在保守的不同车辆之间的互动行为下运作。但是,我们知道,真正的驾驶任务往往涉及非常高风险的情景,而其他驾驶员通常不按预期的方式行事。这种行为有许多原因, 比如疲劳或缺乏经验。模拟环境在培训导航代理时没有考虑到这些信息。因此, 我们在本研究中特别侧重于系统地创造这些风险易发生情况, 其流量大, 以及意外的随机行为, 以创造更好的无型学习代理。我们通过在高速公路- env 模拟包中创建新的自定义的Markov 决策程序( MDP) 环境循环, 产生多种自主的驾驶方案。行为政策是由受过深层强化学习模型帮助的代理师所学会的。我们的行为政策是有意处理碰撞和风险随机化驾驶员行为。我们训练免费示范学习代理, 补充易发生驾驶情况的信息, 并与基线代理器比较它们的工作表现。最后, 我们粗略地测量在培训过程中增加这些渗透过程的影响, 从这些学习到精确的绩效。