在零和零和随机游戏中玩虚伪游戏 (Fictitious play in zero-sum stochastic games)

We present fictitious play dynamics for the general class of stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves agents forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a myopic best response using estimated continuation payoffs. Agents update their beliefs at states visited from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of agent payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

翻译：我们在零和零和随机游戏中为普通类游戏展示假游戏动态,并分析其趋同特性。我们的动态涉及代理人形成对对手策略和他们自己的继续支付(Q功能)的信念,利用估计的继续支付(Q功能)来发挥一种短视的最佳反应。代理人根据对对手动作的观察,在所访问的各州更新他们的信念。学习动态的一个关键特征是,更新关于“功能”的信念的时间范围比更新战略信念的时间范围要慢。我们在基于模型的和没有模型的案例中(不知道代理人的支付功能和状态过渡概率)都显示了关于战略的信念,这些信念与零和零和随机游戏的固定混合的“纳什平衡”一致。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【斯坦福经典书】统计学稀疏性：Lasso与泛化性，362页pdf

专知会员服务

37+阅读 · 2020年11月15日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日