深入RL和IL调查,自主推动政策学习 (A Survey of Deep RL and IL for Autonomous Driving Policy Learning)

Autonomous driving (AD) agents generate driving policies based on online perception results, which are obtained at multiple levels of abstraction, e.g., behavior planning, motion planning and control. Driving policies are crucial to the realization of safe, efficient and harmonious driving behaviors, where AD agents still face substantial challenges in complex scenarios. Due to their successful application in fields such as robotics and video games, the use of deep reinforcement learning (DRL) and deep imitation learning (DIL) techniques to derive AD policies have witnessed vast research efforts in recent years. This paper is a comprehensive survey of this body of work, which is conducted at three levels: First, a taxonomy of the literature studies is constructed from the system perspective, among which five modes of integration of DRL/DIL models into an AD architecture are identified. Second, the formulations of DRL/DIL models for conducting specified AD tasks are comprehensively reviewed, where various designs on the model state and action spaces and the reinforcement learning rewards are covered. Finally, an in-depth review is conducted on how the critical issues of AD applications regarding driving safety, interaction with other traffic participants and uncertainty of the environment are addressed by the DRL/DIL models. To the best of our knowledge, this is the first survey to focus on AD policy learning using DRL/DIL, which is addressed simultaneously from the system, task-driven and problem-driven perspectives. We share and discuss findings, which may lead to the investigation of various topics in the future.

翻译：自主驱动剂(AD)代理商根据在线认知结果制定驱动政策,这些政策是在多个抽象层面,例如行为规划、运动规划和控制等多个层面获得的。驱动政策对于实现安全、高效和和谐的驱动行为至关重要,在复杂情况下,AD代理商仍面临巨大挑战。第二,由于在机器人和视频游戏等领域成功地应用了这些政策,利用深度强化学习(DRL)和深度模仿学习(DIL)技术来产生反倾销政策近年来经历了广泛的研究努力。本文件是对这一组工作的全面调查,在三个层面进行了:第一,从系统角度构建了文献研究的分类,其中确定了将DRL/DI模型纳入AD架构的五种模式。第二,对DRL/DI模型的制定进行了全面审查,其中涵盖了关于示范状态和行动空间的各种设计以及强化学习奖赏。最后,对反倾销应用在驱动安全方面的关键问题、与其他交通参与者的互动、DL/DI模型的不确定性以及从我们学习政策/L模型的不确定性进行了深入审查。