Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement learning (RL), a powerful data-driven control method, has been widely explored for lane-changing decision makings in AVs with encouraging results demonstrated. However, the majority of those studies are focused on a single-vehicle setting, and lane-changing in the context of multiple AVs coexisting with human-driven vehicles (HDVs) have received scarce attention. In this paper, we formulate the lane-changing decision making of multiple AVs in a mixed-traffic highway environment as a multi-agent reinforcement learning (MARL) problem, where each AV makes lane-changing decisions based on the motions of both neighboring AVs and HDVs. Specifically, a multi-agent advantage actor-critic network (MA2C) is developed with a novel local reward design and a parameter sharing scheme. In particular, a multi-objective reward function is proposed to incorporate fuel efficiency, driving comfort, and safety of autonomous driving. Comprehensive experimental results, conducted under three different traffic densities and various levels of human driver aggressiveness, show that our proposed MARL framework consistently outperforms several state-of-the-art benchmarks in terms of efficiency, safety and driver comfort.
翻译:在过去二十年中,自主驾驶吸引了重要的研究兴趣,因为它提供了许多潜在的好处,包括让司机摆脱疲劳驾驶和减少交通拥堵等。尽管取得了令人乐观的进展,但更换车道仍然是自治车辆(AV)的巨大挑战,特别是在混合和动态的交通情况下。最近,强化学习(RL)这一强有力的数据驱动控制方法在AV中为改变车道决策进行了广泛探索,并展示了令人鼓舞的结果。然而,这些研究大多侧重于单一车辆的设置,在与人驱动车辆(HDV)共存的多辆AV中改变车道的做法很少受到注意。在本文件中,我们把改变多轨交通(AV)对多轨交通(AV)的决策视为多轨加剂学习(MARL)问题,每个AV根据相邻的AV和HDV的动作做出改变车道决定。 具体地说,一个多重工具优势的行为体驱动力网络(MA2C)与新的当地奖赏设计以及一个参数共享计划。在本文件中,将多轨的多轨动性机动性驾驶效率、多轨迹功能与不同水平下,根据不同的飞行效率,提出了多种飞行标准计算。