国家空间和政策空间的空间和政策空间的空间和无缝等级强化学习,自主驾驶 (Spatially and Seamlessly Hierarchical Reinforcement Learning for State Space and Policy space in Autonomous Driving)

Despite advances in hierarchical reinforcement learning, its applications to path planning in autonomous driving on highways are challenging. One reason is that conventional hierarchical reinforcement learning approaches are not amenable to autonomous driving due to its riskiness: the agent must move avoiding multiple obstacles such as other agents that are highly unpredictable, thus safe regions are small, scattered, and changeable over time. To overcome this challenge, we propose a spatially hierarchical reinforcement learning method for state space and policy space. The high-level policy selects not only behavioral sub-policy but also regions to pay mind to in state space and for outline in policy space. Subsequently, the low-level policy elaborates the short-term goal position of the agent within the outline of the region selected by the high-level command. The network structure and optimization suggested in our method are as concise as those of single-level methods. Experiments on the environment with various shapes of roads showed that our method finds the nearly optimal policies from early episodes, outperforming a baseline hierarchical reinforcement learning method, especially in narrow and complex roads. The resulting trajectories on the roads were similar to those of human strategies on the behavioral planning level.

翻译：尽管在强化等级学习方面取得了进展,但在高速公路上自主驾驶的路径规划方面,其应用是具有挑战性的。原因之一是常规强化等级学习方法由于风险性,不适合自主驾驶:代理人必须避免多种障碍,例如高度不可预测的其他代理人,因此安全区域规模小、分散、可随时间变化。为了克服这一挑战,我们提议了国家空间和政策空间的空间强化空间空间强化学习方法。高级别政策不仅选择行为次级政策,而且选择区域在州空间和政策空间大纲中考虑。随后,低等级强化学习政策阐述了代理人在高级别指挥所选定的区域大纲中的短期目标位置。我们方法中建议的网络结构和优化与单级方法一样简洁。以不同道路形状对环境的实验表明,我们的方法发现早期的政策几乎是最佳的,超过了基线等级强化学习方法,特别是在狭窄和复杂的道路中。导致的道路轨迹与人类行为规划层面的战略相似。