在零和零和随机游戏中玩虚伪游戏 (Fictitious play in zero-sum stochastic games)

We present a novel variant of fictitious play dynamics combining classical fictitious play with Q-learning for stochastic games and analyze its convergence properties in two-player zero-sum stochastic games. Our dynamics involves players forming beliefs on the opponent strategy and their own continuation payoff (Q-function), and playing a greedy best response by using the estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

翻译：我们展示了将经典假玩和Q学习相结合的假游戏的新型变体,并分析了在双玩者零和随机游戏中的趋同特性。我们的动态涉及玩家形成对对手策略和他们自己的持续回报(Q功能)的信念,并通过使用估计的继续回报(Q功能)来发挥贪婪的最佳反应。玩家根据对对手动作的观察来更新其信念。学习动态的一个重要特征是更新关于Q功能的信念的时间范围比更新对战略的信念要慢。我们在基于模型的和没有模型的案例中(不了解玩家的支付功能和州过渡概率)展示了有关策略的信念,这些信念与零和随机游戏的固定混合的纳什平衡一致。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日