在有不稳定的回报和未知的过渡的游戏游戏中, 滑滑的虚幻游戏游戏 (Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions) - 专知论文

会员服务 ·

0

Learning · Extensibility · 平滑 · 正则化项 · 平稳的 ·

2022 年 7 月 7 日

Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

翻译：在有不稳定的回报和未知的过渡的游戏游戏中, 滑滑的虚幻游戏游戏

Lucas Baudin,Rida Laraki

Recent extensions to dynamic games of the well-known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the systems in (Leslie 2020; Baudin 2022) to construct a family of new decentralized learning algorithms which are model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our procedures can be seen as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where the players best responses are regularized, thanks to a smooth strictly concave perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in zero-sum and identical-interest discounted stochastic games. The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods. When there is only one player, our problem is an instance of Reinforcement Learning and our procedures are proved to globally converge to the optimal stationary policy of the regularized MDP. In that sense, they can be seen as an alternative to the well known Q-learning procedure.

翻译：在静态游戏中,众所周知的虚玩游戏学习程序的动态游戏的最近扩展被证明在全球范围趋同,在两种重要的动态游戏(零和同价折扣游戏)中,固定的Nash 公平均衡(零和同价折扣游戏)。然而,这些分散式算法需要玩家确切地了解模型(过渡概率及其在每个阶段的回报率)。为了克服这些强有力的假设,我们的论文引入了系统规范化(Leslie 2020;Baudin 2022),以构建一套没有模型的新的分散式学习算法(玩家不知道过渡和他们的报酬在每一个阶段都受到干扰)。我们的程序可以被看作是在静态游戏中,传统平滑的模拟游戏学习程序的随机游戏的延伸(玩家的最佳反应是正常的,因为其报酬功能是平坦调的,我们的程序的组合与固定式正常的纳什均衡式的零和同价折扣游戏的组合(玩家不知道的过渡和回报率是每个阶段的交替游戏)。证据使用连续的平滑式最佳反应的游戏游戏游戏游戏游戏游戏游戏游戏的模拟游戏游戏游戏游戏游戏,并且验证我们最常知的最佳学习程序。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

讲座报名丨 ICML专场

讲座报名丨 ICML专场

THU数据派

0+阅读 · 2021年9月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

OCT1在非酒精性脂肪肝和葡萄糖代谢中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Ni3Al基单晶合金中合金化元素行为及其对性能的作用机理

国家自然科学基金

0+阅读 · 2014年12月31日

纳米复合镁基储氢材料热力学及动力学调控

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

Ni3Al基合金单晶生长规律研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

On the (Im)Possibility of Estimating Various Notions of Differential Privacy

On the (Im)Possibility of Estimating Various Notions of Differential Privacy

Arxiv

0+阅读 · 2022年8月30日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2022年8月30日

Optimal Rates for Distributed Learning with Random Features

Arxiv

0+阅读 · 2022年8月30日

A Stochastic Hybrid Systems Approach to the Joint Distribution of Ages of Information in Networks

Arxiv

0+阅读 · 2022年8月29日

Comparing two samples through stochastic dominance: a graphical approach

Arxiv

0+阅读 · 2022年8月29日

Semi-implicit energy-preserving numerical schemes for stochastic wave equation via SAV approach

Arxiv

0+阅读 · 2022年8月29日

Strategyproofing Peer Assessment via Partitioning: The Price in Terms of Evaluators' Expertise

Arxiv

0+阅读 · 2022年8月29日

Influential Node Ranking in Complex Information Networks Using A Randomized Dynamics-Sensitive Approach

Arxiv

0+阅读 · 2022年8月27日

Emergent Spatial Characteristics from Strategic Games Simulated on Random and Real Networks

Arxiv

0+阅读 · 2022年8月27日

Dynamic Regret of Online Markov Decision Processes

Arxiv

0+阅读 · 2022年8月26日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

讲座报名丨 ICML专场

讲座报名丨 ICML专场

THU数据派

0+阅读 · 2021年9月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

On the (Im)Possibility of Estimating Various Notions of Differential Privacy

On the (Im)Possibility of Estimating Various Notions of Differential Privacy

Arxiv

0+阅读 · 2022年8月30日

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Arxiv

0+阅读 · 2022年8月30日

Optimal Rates for Distributed Learning with Random Features

Arxiv

0+阅读 · 2022年8月30日

A Stochastic Hybrid Systems Approach to the Joint Distribution of Ages of Information in Networks

Arxiv

0+阅读 · 2022年8月29日

Comparing two samples through stochastic dominance: a graphical approach

Arxiv

0+阅读 · 2022年8月29日

Semi-implicit energy-preserving numerical schemes for stochastic wave equation via SAV approach

Arxiv

0+阅读 · 2022年8月29日

Strategyproofing Peer Assessment via Partitioning: The Price in Terms of Evaluators' Expertise

Arxiv

0+阅读 · 2022年8月29日

Influential Node Ranking in Complex Information Networks Using A Randomized Dynamics-Sensitive Approach

Arxiv

0+阅读 · 2022年8月27日

Emergent Spatial Characteristics from Strategic Games Simulated on Random and Real Networks

Arxiv

0+阅读 · 2022年8月27日

Dynamic Regret of Online Markov Decision Processes

Arxiv

0+阅读 · 2022年8月26日

相关基金

黎曼流形上 Ricci 曲率的几何

国家自然科学基金

3+阅读 · 2015年12月31日

Chemerin通过调节p38MAPK通路参与动脉粥样硬化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

OCT1在非酒精性脂肪肝和葡萄糖代谢中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

Ni3Al基单晶合金中合金化元素行为及其对性能的作用机理

国家自然科学基金

0+阅读 · 2014年12月31日

纳米复合镁基储氢材料热力学及动力学调控

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

复形范畴中的Gorenstein同调维数

国家自然科学基金

0+阅读 · 2009年12月31日

Ni3Al基合金单晶生长规律研究

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

脂肪因子Chemerin在骨骼肌胰岛素抵抗发生中的作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员