Two-player mean-payoff Stackelberg games are nonzero-sum infinite duration games played on a bi-weighted graph by Leader (Player 0) and Follower (Player 1). Such games are played sequentially: first, Leader announces her strategy, second, Follower chooses his best-response. If we cannot impose which best-response is chosen by Follower, we say that Follower, though strategic, is adversarial towards Leader. The maximal value that Leader can get in this nonzero-sum game is called the adversarial Stackelberg value (ASV) of the game. We study the robustness of strategies for Leader in these games against two types of deviations: (i) Modeling imprecision - the weights on the edges of the game arena may not be exactly correct, they may be delta-away from the right one. (ii) Sub-optimal response - Follower may play epsilon-optimal best-responses instead of perfect best-responses. First, we show that if the game is zero-sum then robustness is guaranteed while in the nonzero-sum case, optimal strategies for ASV are fragile. Second, we provide a solution concept to obtain strategies for Leader that are robust to both modeling imprecision, and as well as to the epsilon-optimal responses of Follower, and study several properties and algorithmic problems related to this solution concept.
翻译:玩家 Stackelberg 游戏不是零和无限的游戏, 由领导者( Player 0) 和 追随者( Player 1) 在双加权图表上玩这种游戏。 这种游戏按顺序进行: 首先, 领导者宣布其战略, 第二, 追随者选择他的最佳反应。 如果我们不能强制实施跟踪者选择的最佳反应, 我们说, 追随者虽然具有战略意义, 却对领导者持对立态度。 领导者在这场非零和游戏中能得到的最大价值被称为游戏的对抗性Stackelberg 值( ASV ) 。 我们研究这些游戏中领导者战略的稳健性, 对抗两种偏差:( 一) 建模不精度 — 游戏场边缘的权重可能不完全正确。 如果我们无法强制实施什么最佳反应, 跟踪者可能会对领导者做出最优性反应, 而不是完美的最佳反应。 首先, 我们显示, 如果游戏是零和随后的稳性策略, 我们就会保证这些游戏的策略的稳健性, 而在不稳的亚行者 的策略中, 将获得一个最稳性的策略。