尽量减少安全舒适自动驾驶的安全干扰,同时进行分发强化学习 (Minimizing Safety Interference for Safe and Comfortable Automated Driving with Distributional Reinforcement Learning)

Despite recent advances in reinforcement learning (RL), its application in safety critical domains like autonomous vehicles is still challenging. Although punishing RL agents for risky situations can help to learn safe policies, it may also lead to highly conservative behavior. In this paper, we propose a distributional RL framework in order to learn adaptive policies that can tune their level of conservativity at run-time based on the desired comfort and utility. Using a proactive safety verification approach, the proposed framework can guarantee that actions generated from RL are fail-safe according to the worst-case assumptions. Concurrently, the policy is encouraged to minimize safety interference and generate more comfortable behavior. We trained and evaluated the proposed approach and baseline policies using a high level simulator with a variety of randomized scenarios including several corner cases which rarely happen in reality but are very crucial. In light of our experiments, the behavior of policies learned using distributional RL can be adaptive at run-time and robust to the environment uncertainty. Quantitatively, the learned distributional RL agent drives in average 8 seconds faster than the normal DQN policy and requires 83\% less safety interference compared to the rule-based policy with slightly increasing the average crossing time. We also study sensitivity of the learned policy in environments with higher perception noise and show that our algorithm learns policies that can still drive reliable when the perception noise is two times higher than the training configuration for automated merging and crossing at occluded intersections.

翻译：尽管在强化学习(RL)方面最近有所进展,但在诸如自治车辆等安全关键领域应用该标准仍然具有挑战性。虽然针对危险情况惩罚RL代理商可有助于学习安全政策,但也可能导致高度保守的行为。在本文件中,我们提议了一个分配RL框架,以便学习适应性政策,这种政策能够根据所期望的舒适和实用性在运行时调整其保守程度。采用积极主动的安全核查方法,拟议框架可以保证根据最坏假设,RL产生的行动是安全的。与此同时,鼓励该政策尽量减少安全干扰,并产生更舒适的行为。我们用高水平的模拟器,对拟议方法和基线政策进行了培训和评估,这些模拟器具有各种随机化的情景,包括一些在现实中很少发生但非常关键的转角案例。根据我们的实验,使用分配RL的政策行为可以在运行时适应性强力地适应环境的不确定性。在正常的DQN政策之前平均8秒钟里,所学的分发RL代理商驱动力比正常的DQN政策更快,在基于规则的政策中要求比较高的安全干扰程度低83 ⁇,而在以略的高度的跨边界政策中,同时,我们所学到的标准化政策学会的标准化政策中,在学习的标准化政策中学习到的高度的高度的高度的高度的动态学。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日