Applying reinforcement learning to autonomous driving entails certain challenges, primarily due to massive traffic flows, which change dynamically. To address such challenges, it is necessary to quickly determine response strategies to the changing intentions of surrounding vehicles. Accordingly, we propose a new policy optimization method for safe driving using graph-based interaction-aware constraints. In this framework, the motion prediction and control modules are trained simultaneously, while sharing a latent representation that contains a social context. Further, to reflect social interactions, we express the movements of agents in the graph form and filter the features. This helps preserve the spatiotemporal locality of adjacent nodes. Furthermore, we create feedback loops to combine these two modules effectively. As a result, this approach encourages the learned controller to be safe from dynamic risks and also renders the motion prediction robust under various situations. In the experiment, we set up a navigation scenario comprising various situations, with CARLA, an urban driving simulator. The experiments show state-of-the-art performance on the sides of both navigation strategy and motion prediction compared to the baselines.
翻译:将强化学习应用到自主驾驶,这带来了某些挑战,主要由于交通流量大增,这些变化是动态的。为了应对这些挑战,我们必须迅速确定应对周围车辆意图变化的战略。因此,我们提出一种新的政策优化方法,使用基于图形的互动意识限制进行安全驾驶。在这个框架内,运动预测和控制模块同时培训,同时共享包含社会背景的潜在代表。此外,为了反映社会互动,我们用图表形式表达各种物剂的移动,过滤特征。这有助于保护相邻节点的短暂时空位置。此外,我们创建反馈循环,将这两个模块有效地结合起来。因此,这一方法鼓励学习的控制器不受动态风险的影响,并使运动预测在各种情况下都变得稳健。在实验中,我们与城市驱动模拟器CARLA一起设计了一个由各种情况组成的导航场景。实验展示了导航战略和运动预测与基线的两侧的状态。