Building autonomous vehicles (AVs) is a complex problem, but enabling them to operate in the real world where they will be surrounded by human-driven vehicles (HVs) is extremely challenging. Prior works have shown the possibilities of creating inter-agent cooperation between a group of AVs that follow a social utility. Such altruistic AVs can form alliances and affect the behavior of HVs to achieve socially desirable outcomes. We identify two major challenges in the co-existence of AVs and HVs. First, social preferences and individual traits of a given human driver, e.g., selflessness and aggressiveness are unknown to an AV, and it is almost impossible to infer them in real-time during a short AV-HV interaction. Second, contrary to AVs that are expected to follow a policy, HVs do not necessarily follow a stationary policy and therefore are extremely hard to predict. To alleviate the above-mentioned challenges, we formulate the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose a decentralized framework and reward function for training cooperative AVs. Our approach enables AVs to learn the decision-making of HVs implicitly from experience, optimizes for a social utility while prioritizing safety and allowing adaptability; robustifying altruistic AVs to different human behaviors and constraining them to a safe action space. Finally, we investigate the robustness, safety and sensitivity of AVs to various HVs behavioral traits and present the settings in which the AVs can learn cooperative policies that are adaptable to different situations.
翻译:建立自主车辆是一个复杂的问题,但能让这些车辆在现实世界中运作,而由人类驱动的车辆环绕着这些车辆环绕在现实世界中,这是极具挑战性的。先前的工作表明,有可能在遵循一种社会效用的AV集团之间建立部门间合作。这种利他主义AV可以形成联盟,影响HV实现社会理想结果的行为。我们确定了在AV和HV共存方面的两大挑战。首先,特定人驱动者的社交偏好和个人特征,例如,AV不了解无私和攻击性,而且几乎不可能在AV-HV短暂互动期间实时地推断出它们。第二,与预期要遵循的政策相反,HV不一定遵循固定政策,因此很难预测。为了缓解上述挑战,我们把混合自闭式自治问题作为多试强化学习(MARL)的问题,并提议一个分散的框架和奖励功能,用于培训合作性AV-V互动互动互动的动态互动互动互动互动互动互动。我们的方法使得AV的稳健性决策能够学习,而使AV最稳健的稳定性的自我调整,最后使AV行动学会如何优化。