应用自主驾驶的 " 人类在卢博深强化 " 学习 (Human-in-the-Loop Deep Reinforcement Learning with Application to Autonomous Driving)

Due to the limited smartness and abilities of machine intelligence, currently autonomous vehicles are still unable to handle all kinds of situations and completely replace drivers. Because humans exhibit strong robustness and adaptability in complex driving scenarios, it is of great importance to introduce humans into the training loop of artificial intelligence, leveraging human intelligence to further advance machine learning algorithms. In this study, a real-time human-guidance-based deep reinforcement learning (Hug-DRL) method is developed for policy training of autonomous driving. Leveraging a newly designed control transfer mechanism between human and automation, human is able to intervene and correct the agent's unreasonable actions in real time when necessary during the model training process. Based on this human-in-the-loop guidance mechanism, an improved actor-critic architecture with modified policy and value networks is developed. The fast convergence of the proposed Hug-DRL allows real-time human guidance actions to be fused into the agent's training loop, further improving the efficiency and performance of deep reinforcement learning. The developed method is validated by human-in-the-loop experiments with 40 subjects and compared with other state-of-the-art learning approaches. The results suggest that the proposed method can effectively enhance the training efficiency and performance of the deep reinforcement learning algorithm under human guidance, without imposing specific requirements on participant expertise and experience.

翻译：由于机智智能的智能和能力有限,目前自治的车辆仍然无法处理各种情况,完全取代驾驶员。由于人类在复杂的驾驶场景中表现出很强的强健性和适应性,因此将人引入人工智能的培训循环,利用人类智能进一步推进机器学习算法。在这项研究中,为自主驾驶的政策培训开发了实时的以人指导为基础的深层强化学习(Hug-DRL)方法。利用新设计的人与自动化之间的控制转移机制,人类在模型培训过程中在必要时能够实时干预和纠正代理人的不合理行动。基于这一在轨人员指导机制,开发出经修改的政策和价值网络的改进的行为者-批评结构。拟议的Hug-DRL快速整合了实时的人力指导行动,从而得以与代理人的培训循环相结合,进一步提高了深度强化学习的效率和效力。在模型中进行的人与40个主题的远程实验,并与其他州级的强化学习方法相比较,提出了提高学习效率的方法。