使用强化学习的不确定性存在时的自治车辆规划 (Motion Planning for Autonomous Vehicles in the Presence of Uncertainty Using Reinforcement Learning)

Motion planning under uncertainty is one of the main challenges in developing autonomous driving vehicles. In this work, we focus on the uncertainty in sensing and perception, resulted from a limited field of view, occlusions, and sensing range. This problem is often tackled by considering hypothetical hidden objects in occluded areas or beyond the sensing range to guarantee passive safety. However, this may result in conservative planning and expensive computation, particularly when numerous hypothetical objects need to be considered. We propose a reinforcement learning (RL) based solution to manage uncertainty by optimizing for the worst case outcome. This approach is in contrast to traditional RL, where the agents try to maximize the average expected reward. The proposed approach is built on top of the Distributional RL with its policy optimization maximizing the stochastic outcomes' lower bound. This modification can be applied to a range of RL algorithms. As a proof-of-concept, the approach is applied to two different RL algorithms, Soft Actor-Critic and DQN. The approach is evaluated against two challenging scenarios of pedestrians crossing with occlusion and curved roads with a limited field of view. The algorithm is trained and evaluated using the SUMO traffic simulator. The proposed approach yields much better motion planning behavior compared to conventional RL algorithms and behaves comparably to humans driving style.

翻译：在不确定情况下进行机动规划是开发自主驾驶器方面的主要挑战之一。在这项工作中,我们注重感知和感知的不确定性,这是由有限的视野、封闭性和感知范围造成的。这个问题往往通过考虑在隐蔽地区或遥感范围以外的假设隐藏物体来加以解决,以保证被动安全。然而,这可能导致保守的规划和昂贵的计算,特别是在需要考虑许多假设对象时。我们建议以强化学习(RL)为基础的解决方案为基础,通过优化最坏情况结果来管理不确定性。与传统的RL不同的是,这是传统RL的做法,代理试图最大限度地提高平均预期报酬。拟议的方法建在分配RL的顶端,其政策优化是最大限度地实现随机结果的较低界限。这一修改可适用于一系列RL算法。作为证据,该方法适用于两种不同的RL算法,即Soft Acor-Critict和DQN。这一方法被评估为两种具有挑战性的行人与隐蔽和弯曲性道路交叉的情景,后者试图尽量提高平均预期的收益率。对SUL的演算法进行了培训,并用最有较精确的飞行的动作的动作进行了评估。