Many real-world strategic games involve interactions between multiple players. We study a hierarchical multi-player game structure, where players with asymmetric roles can be separated into leaders and followers, a setting often referred to as Stackelberg game or leader-follower game. In particular, we focus on a Stackelberg game scenario where there are multiple leaders and a single follower, called the Multi-Leader-Single-Follower (MLSF) game. We propose a novel asymmetric equilibrium concept for the MLSF game called Correlated Stackelberg Equilibrium (CSE). We design online learning algorithms that enable the players to interact in a distributed manner, and prove that it can achieve no-external Stackelberg-regret learning. This further translates to the convergence to approximate CSE via a reduction from no-external regret to no-swap regret. At the core of our works, we solve the intricate problem of how to learn equilibrium in leader-follower games with noisy bandit feedback by balancing exploration and exploitation in different learning structures.
翻译:许多真实世界的战略游戏都涉及多个玩家之间的互动。 我们研究一个等级化的多玩家游戏结构, 使角色不对称的玩家可以分为领导者和追随者, 一种常被称作 Stackelberg 游戏或追随者游戏的游戏。 特别是, 我们侧重于一个有多重领导者和单一追随者参加的Stackelberg 游戏场景, 叫做 MLSF 游戏, 叫做 MLSF 游戏, 我们为 MLSF 游戏提出了一个新的不对称平衡概念, 叫做 Cor 相关Stackelberg Equiblium (CSE) 。 我们设计了在线学习算法, 使玩家能够以分散的方式互动, 并证明它能够实现无外部 Stakkelberg- regret 学习。 这进一步转化了通过将无外悔减为无反感。 在我们工作的核心, 我们通过平衡不同学习结构的探索和开发, 来解决如何在追领者游戏中学习平衡, 使用吵的乐队回馈的问题错综复杂。