We study the sample complexity of identifying an approximate equilibrium for two-player zero-sum $n\times 2$ matrix games. That is, in a sequence of repeated game plays, how many rounds must the two players play before reaching an approximate equilibrium (e.g., Nash)? We derive instance-dependent bounds that define an ordering over game matrices that captures the intuition that the dynamics of some games converge faster than others. Specifically, we consider a stochastic observation model such that when the two players choose actions $i$ and $j$, respectively, they both observe each other's played actions and a stochastic observation $X_{ij}$ such that $\mathbb E[ X_{ij}] = A_{ij}$. To our knowledge, our work is the first case of instance-dependent lower bounds on the number of rounds the players must play before reaching an approximate equilibrium in the sense that the number of rounds depends on the specific properties of the game matrix $A$ as well as the desired accuracy. We also prove a converse statement: there exist player strategies that achieve this lower bound.
翻译:本文研究了识别近似均衡状态的两人零和 $n\times 2$ 矩阵博弈的样本复杂度。也就是说,在一系列重复的游戏过程中,在达到近似均衡状态(如 Nash 状态)之前,两个玩家需要进行多少轮游戏。我们推导了基于实例的界限,定义了一个博弈矩阵的排序,捕捉某些博弈的动态比其他博弈更快地收敛这一直觉。具体而言,我们考虑一种随机观察模型,即当两个玩家选择动作 $i$ 和 $j$ 时,他们都会观察到对方的动作以及一个随机观察值 $X_{ij}$,使得 $\mathbb{E}[X_{ij}]=A_{ij}$。据我们所知,本文是首次对于识别近似均衡状态下,玩家必须进行的轮数的实例依赖性下界的探索,这是因为轮数取决于博弈矩阵 $A$ 的特定属性和所要求的精度。我们还证明了一种如下的转化语句:存在玩家策略可以实现这个下界。