It is expected that autonomous vehicles(AVs) and heterogeneous human-driven vehicles(HVs) will coexist on the same road. The safety and reliability of AVs will depend on their social awareness and their ability to engage in complex social interactions in a socially accepted manner. However, AVs are still inefficient in terms of cooperating with HVs and struggle to understand and adapt to human behavior, which is particularly challenging in mixed autonomy. In a road shared by AVs and HVs, the social preferences or individual traits of HVs are unknown to the AVs and different from AVs, which are expected to follow a policy, HVs are particularly difficult to forecast since they do not necessarily follow a stationary policy. To address these challenges, we frame the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose an approach that allows AVs to learn the decision-making of HVs implicitly from experience, account for all vehicles' interests, and safely adapt to other traffic situations. In contrast with existing works, we quantify AVs' social preferences and propose a distributed reward structure that introduces altruism into their decision-making process, allowing the altruistic AVs to learn to establish coalitions and influence the behavior of HVs.
翻译:预计自主车辆和各种人类驱动车辆将在同一条道路上共存,AV的安全和可靠性将取决于其社会意识和以社会接受的方式参与复杂的社会互动的能力,然而,AV在与HV合作以及努力理解和适应人类行为方面仍然效率低下,这在混合自治方面特别具有挑战性。在AV和HV共用的道路上,HV的社会偏好或个人特征是AV所不了解的,与AV所预期遵循的政策不同,HV特别难以预测,因为它们不一定遵循固定政策。为应对这些挑战,我们把混合自主问题作为多剂强化学习问题,并提议一种办法,使AV能够从经验中隐含地学习HV的决策,考虑到所有车辆的利益,并安全地适应其他交通情况。与现有的工作不同,我们量化AV的社会偏好,并提议一种分布式的奖励结构,将AV行为联盟引入其决策中,从而将AV行为和Arru主义的影响力引入AV。