Applying reinforcement learning to autonomous driving entails particular challenges, primarily due to dynamically changing traffic flows. To address such challenges, it is necessary to quickly determine response strategies to the changing intentions of surrounding vehicles. This paper proposes a new policy optimization method for safe driving using graph-based interaction-aware constraints. In this framework, the motion prediction and control modules are trained simultaneously while sharing a latent representation that contains a social context. To reflect social interactions, we illustrate the movements of agents in graph form and filter the features with the graph convolution networks. This helps preserve the spatiotemporal locality of adjacent nodes. Furthermore, we create feedback loops to combine these two modules effectively. As a result, this approach encourages the learned controller to be safe from dynamic risks and renders the motion prediction robust to abnormal movements. In the experiment, we set up a navigation scenario comprising various situations with CARLA, an urban driving simulator. The experiments show state-of-the-art performance on navigation strategy and motion prediction compared to the baselines.
翻译:将强化学习应用到自主驾驶需要特殊的挑战,这主要是由于交通流量动态变化所致。为了应对这些挑战,必须快速确定应对周围车辆意图变化的战略。本文件提出使用图形化互动意识限制的安全驾驶新政策优化方法。在此框架内,运动预测和控制模块同时接受培训,同时共享包含社会背景的潜在代表。为了反映社会互动,我们用图表形式展示各种物剂的移动,并通过图集网络过滤特征。这有助于保护相邻节点的时空位置。此外,我们创建反馈回路以有效地将这两个模块组合起来。因此,这一方法鼓励学习的控制器不受动态风险影响,并使运动预测对异常移动产生强大的影响。在实验中,我们设置了一种导航场景,包括与城市驱动模拟器CARLA(城市驱动模拟器)的不同情形。实验显示导航战略和运动预测与基线相比的最新表现。