In this article we analyze a partial-information Nash Q-learning algorithm for a general 2-player stochastic game. Partial information refers to the setting where a player does not know the strategy or the actions taken by the opposing player. We prove convergence of this partially informed algorithm for general 2-player games with finitely many states and actions, and we confirm that the limiting strategy is in fact a full-information Nash equilibrium. In implementation, partial information offers simplicity because it avoids computation of Nash equilibria at every time step. In contrast, full-information Q-learning uses the Lemke-Howson algorithm to compute Nash equilibria at every time step, which can be an effective approach but requires several assumptions to prove convergence and may have runtime error if Lemke-Howson encounters degeneracy. In simulations, the partial information results we obtain are comparable to those for full-information Q-learning and fictitious play.
翻译:在此篇文章中, 我们分析普通 2 玩家游戏的局部信息 Nash Q 学习算法 。 部分信息指的是玩家不知道对方玩家策略或行动的设置 。 我们证明普通 2 玩家游戏的部分知情算法与有限的许多状态和行动相融合, 我们确认限制策略实际上是全信息 Nash 平衡 。 在执行中, 部分信息提供简单, 因为它避免了每次计算 Nash 的平衡 。 相反, 完整信息 Q 学习 使用 lemke- Howson 算法来计算 Nash equilibria 的每一次步骤, 这可以是一种有效的方法, 但需要几种假设来证明趋同性, 如果Lemke- Howson 遇到退化性, 则可能存在运行时间错误 。 在模拟中, 我们所获得的部分信息结果与全信息 Q 学习 和 虚构游戏相似 。