学习如何在冲突高发情景下大力谈判双向使用双向通道 (Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios)

Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities. There exists no centralized control structure and we do not allow communications. Therefore, it is unknown if other drivers are willing to cooperate, and if so to what extent. We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL). We propose Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum-entropy off-policy MARL algorithm allowing for centralized training with decentralized execution. We show that using DASAC we are able to successfully negotiate and traverse the scenario considered over 99% of the time. Our agents are robust to an unknown timing of opponent decisions, an unobservable degree of cooperativeness of the opposing vehicle, and previously unencountered policies. Furthermore, they learn to exhibit human-like behaviors such as defensive driving, anticipating solution options and interpreting the behavior of other agents.

翻译：最近,自主驾驶在应对交叉导航和航道变化等最常见的交通情况方面取得了长足进展;然而,这些成功大多局限于有明确界定的交通规则的情景,需要与其他车辆进行最低程度的谈判;在本文件中,我们引入了以前未曾考虑但每天都是高冲突高发的驾驶情景,要求权利平等和优先事项的代理人进行谈判;没有中央控制结构,我们不允许通信;因此,不知道其他驾驶员是否愿意合作,如果愿意合作,其程度如何;我们培训政策,以便利用多剂强化学习(MARL),与具有不易观察程度的合作的对立车辆进行谈判;我们提议采用偏差的对称软动作-Critict(DASAC),这是一种最没有考虑的离题性MARL(DAR)算法,允许分散执行集中培训;我们表明,利用DASAC,我们能够成功地谈判和绕过99%以上的时间考虑的情景。我们的工作人员对一个未知的反对者决定的时机非常活跃,对立,对立的车辆的合作程度难以观察。我们提议采用不同程度的对等的车辆采取不看似合作程度的合作态度,他们学习其他防御性的行为方式,并展示其他防御性的行为。