We consider the interaction among agents engaging in a driving task and we model it as general-sum game. This class of games exhibits a plurality of different equilibria posing the issue of equilibrium selection. While selecting the most efficient equilibrium (in term of social cost) is often impractical from a computational standpoint, in this work we study the (in)efficiency of any equilibrium players might agree to play. More specifically, we bound the equilibrium inefficiency by modeling driving games as particular type of congestion games over spatio-temporal resources. We obtain novel guarantees that refine existing bounds on the Price of Anarchy (PoA) as a function of problem-dependent game parameters. For instance, the relative trade-off between proximity costs and personal objectives such as comfort and progress. Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies trained via decentralized multi-agent reinforcement learning.
翻译:我们把从事驾驶任务的代理人之间的相互作用视为一般和游戏,我们把它当作一般和游戏来模拟。这类游戏展示了多种不同的平衡,提出了均衡选择问题。从计算的角度来说,选择效率最高平衡(以社会成本计算)往往不切实际,在这项工作中,我们研究任何均衡玩家的(不)效率可能同意玩耍。更具体地说,我们通过模拟驾驶游戏将平衡效率作为针对时空资源的特殊类型的堵塞游戏。我们获得了新的保障,改进了无政府主义价格的现有界限,将其作为取决于问题的游戏参数的函数。例如,近距离成本与舒适和进步等个人目标之间的相对取舍。尽管所获得的保证涉及开放-loop轨迹,但即使代理人采用通过分散的多试剂强化学习培训的闭环政策,我们也观察了高效率的平衡。