There are many challenges in applying deep reinforcement learning (DRL) to autonomous driving in a structured environment such as an urban area. This is because the massive traffic flows moving along the road network change dynamically. It is a key factor to detect changes in the intentions of surrounding vehicles and quickly find a response strategy. In this paper, we suggest a new framework that effectively combines graph-based intention representation learning and reinforcement learning for kinodynamic planning. Specifically, the movement of dynamic agents is expressed as a graph. The spatio-temporal locality of node features is conserved and the features are aggregated by considering the interaction between adjacent nodes. We simultaneously learn motion planner and controller that share the aggregated information via a safe RL framework. We intuitively interpret a given situation with predicted trajectories to generate additional cost signals. The dense cost signals encourage the policy to be safe for dynamic risk. Moreover, by utilizing the data obtained through the direct rollout of learned policy, robust intention inference is achieved for various situations encountered in training. We set up a navigation scenario in which various situations exist by using CARLA, an urban driving simulator. The experiments show the state-of-the-art performance of our approach compared to the existing baselines.
翻译:在像城市地区这样的结构化环境中,将深度强化学习(DRL)应用到自主驾驶方面有许多挑战。这是因为在公路网络上流动的大量交通流动动态动态变化动态。这是探测周围车辆意图变化和迅速找到应对战略的一个关键因素。在本文件中,我们建议建立一个新框架,将基于图形的意向代表学习和强化学习有效地结合到动态动力规划中。具体地说,动态物剂的流动表现为图表。节点特征的时空位置通过考虑相邻节点之间的相互作用而得以保持,这些特征被汇总。我们同时学习通过安全的RL框架共享汇总信息的运动规划师和控制员。我们用预测轨迹对特定情况进行直截了当的解释,以产生额外的成本信号。密集的成本信号鼓励政策安全应对动态风险。此外,利用通过直接推出的学习政策获得的数据,在培训中遇到的各种情况中实现了稳健的推断。我们设置了一个导航情景,通过使用城市驱动器的基线(CARLA),即城市驱动器的运行状态。实验展示了一种状态。