Competent multi-lane cruising requires using lane changes and within-lane maneuvers to achieve good speed and maintain safety. This paper proposes a design for autonomous multi-lane cruising by combining a hierarchical reinforcement learning framework with a novel state-action space abstraction. While the proposed solution follows the classical hierarchy of behavior decision, motion planning and control, it introduces a key intermediate abstraction within the motion planner to discretize the state-action space according to high level behavioral decisions. We argue that this design allows principled modular extension of motion planning, in contrast to using either monolithic behavior cloning or a large set of hand-written rules. Moreover, we demonstrate that our state-action space abstraction allows transferring of the trained models without retraining from a simulated environment with virtually no dynamics to one with significantly more realistic dynamics. Together, these results suggest that our proposed hierarchical architecture is a promising way to allow reinforcement learning to be applied to complex multi-lane cruising in the real world.
翻译:高能多车道巡航需要使用车道改变和车内操作来达到良好的速度并保持安全。 本文建议设计自主多车道巡航,将等级强化学习框架与新型的州-行动空间抽象化相结合。 虽然拟议解决方案遵循行为决定、运动规划和控制等传统等级,但它在运动规划器中引入了关键的中间抽象化,以根据高水平的行为决策将州-行动空间分解。 我们认为,这一设计允许运动规划有原则的模块扩展,而不是使用单一行为克隆或一大批手写规则。 此外,我们证明,我们的国家-行动空间抽象化允许将经过训练的模型从一个几乎没有动态的模拟环境转移到一个具有显著现实动态的模拟环境,而无需再培训。 这些结果共同表明,我们拟议的等级结构是一个有希望的方法,可以将强化学习应用到现实世界中复杂的多车道巡航程中。