Despite the leaps in the autonomous driving domain, autonomous vehicles (AVs) are still inefficient and limited in terms of cooperating with each other or coordinating with vehicles operated by humans. A group of autonomous and human-driven vehicles (HVs) which work together to optimize an altruistic social utility -- as opposed to the egoistic individual utility -- can co-exist seamlessly and assure safety and efficiency on the road. Achieving this mission is challenging in the absence of explicit coordination among agents. Additionally, existence of humans in mixed-autonomy environments create social dilemmas as they are known to be heterogeneous in social preference and their behavior is hard to predict by nature. Formally, we model an AV's maneuver planning in mixed-autonomy traffic as a partially-observable stochastic game and attempt to derive optimal policies that lead to socially-desirable outcomes using our multi-agent reinforcement learning framework. We introduce a quantitative representation of the AVs' social value orientation and design a distributed reward structure that induces altruism into their decision making process. Our trained altruistic AVs are able to form alliances, guide the traffic, and affect the behavior of the HVs to handle conflictive and competitive driving scenarios. As a case study, we compare egoistic AVs to our altruistic autonomous agents in a highway merging case study and demonstrate a significant improvement in the number of successful merges as well as the overall traffic flow and safety.
翻译:尽管自主驾驶领域出现了飞跃,但自治车辆在彼此合作或与人驾驶的车辆协调方面仍然效率低下和有限,而且有限。一组自主和人类驱动的车辆(HV)合力优化利他主义社会效用 -- -- 而不是自我主义个人效用 -- -- 能够无缝共存,确保道路上的安全和高效。在代理人之间缺乏明确协调的情况下,完成这项任务具有挑战性。此外,在混合自治环境中存在的人造成了社会两难处境,因为众所周知,他们有不同的社会偏好,他们的行为很难用自然来预测。形式上,我们模拟AV在混合自主交通中进行操纵规划,作为一种部分可观察的游戏,并试图利用我们的多剂强化学习框架来制定最佳政策,从而取得社会可喜的结果。我们采用AV的社会价值取向定量代表制,并设计一种分配的奖励结构,促使他们做出他们所知道的社会偏向性,他们的行为很难用自然来预测。我们经过训练的利他主义AV行为在混合自治交通中的操纵动作规划,作为一种可观的动态机动性机动性研究,从而形成一个具有竞争性的动态的动态的机动性研究,并影响一个动态的机动性研究。我们对动态的动态的动态的动态进行一个案例进行一个动态的机动性研究,并影响进行一个动态的机动性研究,并影响,作为一个动态的机动性研究,并影响,并影响着一个动态的机动性研究,作为一个动态的实验性研究的动态的实验性研究的实验性研究,作为一个典型的实验性研究的实验性研究的实验性研究的一个案例进行一个动态的实验性研究,作为一个典型的实验性研究,作为一个典型的实验性研究,作为一个典型的实验性研究的实验性研究的实验性研究的实验性研究的一个案例进行一个典型的实验性研究。