We study the problem of efficiently computing optimal strategies in asymmetric leader-follower games repeated a finite number of times, which presents a different set of technical challenges than the infinite-horizon setting. More precisely, we give efficient algorithms for finding approximate Stackelberg equilibria in finite-horizon repeated two-player games, along with rates of convergence depending on the horizon $T$. We give two algorithms, one computing strategies with an optimal $\frac{1}{T}$ rate at the expense of an exponential dependence on the number of actions, and another (randomized) approach computing strategies with no dependence on the number of actions but a worse dependence on $T$ of $\frac{1}{T^{0.25}}$. Both algorithms build upon a linear program to produce simple automata leader strategies and induce corresponding automata best-responses for the follower. We complement these results by showing that approximating the Stackelberg value in three-player finite-horizon repeated games is a computationally hard problem via a reduction from the balanced vertex cover problem.
翻译:我们研究了在非对称领导追随者游戏中高效计算最佳策略的问题,重复了有限次数,这提出了一套与无限正数设置不同的技术挑战。更准确地说,我们给出了高效的算法,以在有限正数重复的双玩游戏中找到大约Stackelberg equilibria,以及取决于地平线$T的趋同率。我们给出了两种算法,一种计算法,一种是最佳的$frac{1}T$,以牺牲对行动数量的指数依赖为代价,另一种(随机的)计算法,不依赖行动的数量,但更依赖$frac{1}T ⁇ 0.25$的美元。两种算法都建立在线性程序上,以产生简单的自动引导策略,并引导相应的自动数据最佳响应跟踪者。我们对这些结果的补充是,我们通过减少平衡的脊椎覆盖问题来显示,在三对三人定正数重复的游戏中匹配Stackelberg值是一个计算困难。