Autonomous navigation in crowded, complex urban environments requires interacting with other agents on the road. A common solution to this problem is to use a prediction model to guess the likely future actions of other agents. While this is reasonable, it leads to overly conservative plans because it does not explicitly model the mutual influence of the actions of interacting agents. This paper builds a reinforcement learning-based method named MIDAS where an ego-agent learns to affect the control actions of other cars in urban driving scenarios. MIDAS uses an attention-mechanism to handle an arbitrary number of other agents and includes a "driver-type" parameter to learn a single policy that works across different planning objectives. We build a simulation environment that enables diverse interaction experiments with a large number of agents and methods for quantitatively studying the safety, efficiency, and interaction among vehicles. MIDAS is validated using extensive experiments and we show that it (i) can work across different road geometries, (ii) results in an adaptive ego policy that can be tuned easily to satisfy performance criteria such as aggressive or cautious driving, (iii) is robust to changes in the driving policies of external agents, and (iv) is more efficient and safer than existing approaches to interaction-aware decision-making.
翻译:在拥挤而复杂的城市环境中自主航行需要与道路上的其他代理人进行互动。 这一问题的一个共同解决办法是使用预测模型来猜测其他代理人今后可能采取的行动。 虽然这是合理的,但它会导致过于保守的计划,因为它没有明确地模拟相互作用代理人行动的相互影响。本文建立了一个强化学习方法,名为MIDAS, 自我代理学会影响其他汽车在城市驾驶场景中的控制行动。MIDAS使用关注机制来处理任意数目的其他代理人,并包括一个“驱动型”参数,以学习一种跨越不同规划目标的单一政策。我们建立一个模拟环境,以便能够与大量代理人进行多种互动实验,并用数量研究车辆的安全、效率和互动。MIDAS通过广泛的实验得到验证,我们表明它(一) 能够跨越不同的道路地理格局,(二) 导致适应性自我政策,可以很容易地适应性地适应诸如攻击性或谨慎性驾驶等性工作标准,(三) 能够稳健地适应外部代理人驾驶政策的变化,以及(四) 互动比现有的决策效率更高和更安全。