We analyze the convergence properties of the two-timescale fictitious play combining the classical fictitious play with the Q-learning for two-player zero-sum stochastic games with player-dependent learning rates. We show its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes. To this end, we formulate a novel Lyapunov function formulation and present a one-sided asynchronous convergence result.
翻译:我们分析了将经典假剧与Q-学习相结合的两玩零和零和随机游戏与以玩家为依存学习率的玩家双玩游戏的双重规模虚构游戏的趋同性。 当贴现系数低于玩家依存步脚大小比率的产物时,我们几乎可以肯定地显示它与标准假设的双重规模随机切换近似方法的趋同性。 为此,我们制作了一部新型的Lyapunov函数配制,并提出了片面的非同步趋同性结果。