利用强化学习的自治车辆的驾驶政策适应性保障 (Driving-Policy Adaptive Safeguard for Autonomous Vehicles Using Reinforcement Learning)

Safeguard functions such as those provided by advanced emergency braking (AEB) can provide another layer of safety for autonomous vehicles (AV). A smart safeguard function should adapt the activation conditions to the driving policy, to avoid unnecessary interventions as well as improve vehicle safety. This paper proposes a driving-policy adaptive safeguard (DPAS) design, including a collision avoidance strategy and an activation function. The collision avoidance strategy is designed in a reinforcement learning framework, obtained by Monte-Carlo Tree Search (MCTS). It can learn from past collisions and manipulate both braking and steering in stochastic traffics. The driving-policy adaptive activation function should dynamically assess current driving policy risk and kick in when an urgent threat is detected. To generate this activation function, MCTS' exploration and rollout modules are designed to fully evaluate the AV's current driving policy, and then explore other safer actions. In this study, the DPAS is validated with two typical highway-driving policies. The results are obtained through and 90,000 times in the stochastic and aggressive simulated traffic. The results are calibrated by naturalistic driving data and show that the proposed safeguard reduces the collision rate significantly without introducing more interventions, compared with the state-based benchmark safeguards. In summary, the proposed safeguard leverages the learning-based method in stochastic and emergent scenarios and imposes minimal influence on the driving policy.

翻译：高级应急制动(AEB)所提供的保障功能,如先进的应急制动(AEB)所提供的保障功能,可以为自治车辆提供另一层安全。智能保护功能应当使启动条件适应驾驶政策,以避免不必要的干预,并改善车辆安全。本文件建议设计一个机动政策适应性保障(DPAS)设计,包括避免碰撞战略和启动功能。避免碰撞战略设计在一个强化学习框架内,由蒙特卡洛树搜索(MCTS)获得。它可以从过去的碰撞中学习,在随机交通中操纵制动和引导。驾驶政策适应性启动功能应当动态评估当前的驱动政策风险,并在发现紧急威胁时启动。为生成这一启动功能,MCTS的探索和推出模块旨在充分评价AV目前的驱动政策,然后探索其他更安全的行动。在这项研究中,通过两种典型的公路驾驶政策驱动政策驱动政策(MCTS)驱动政策(MTS)驱动力驱动力搜索(MCTS)系统(MCTS)系统(MCTS)系统(M)系统(MCTS)系统(M)系统(MTS)系统(MCTT)系统(MTS)系统(MTS)系统(M)系统(M)系统(M)系统。其运行模拟交通模拟交通中)系统(Ming)系统运行和导动能和制动能模拟交通。其结果通过90000次校正校准校正校准校正校准。其结果,其结果校准,其结果应当根据自然驱动数据校准,根据自然驱动数据数据数据数据校准数据校准,并显示校准,并显示低压率调整结果,并显示低压率,并显示低压率,并显示低压率。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

自动驾驶汽车的协调:分类和调查综述（Coordination of Autonomous Vehicles: Taxonomy and Survey），附31页pdf

专知会员服务

14+阅读 · 2020年1月9日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

【麻省理工学院课程】MIT 6.S094: Deep Learning for Self-Driving Cars，深度学习和自动驾驶课程

专知会员服务

52+阅读 · 2019年11月1日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日