以美元为单位与双镜底人种独立链条进行玩耍运动会的固定学习 Nash Equiquiblium Polication Polication of $n-Play Stopchatic Paltic Stopchatic Police (Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains via Dual Mirror Descent) - 专知论文

会员服务 ·

0

相互独立的 · 学成 · 纳什均衡 · 平稳的 · 几乎必然 ·

2022 年 5 月 7 日

Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains via Dual Mirror Descent

翻译：以美元为单位与双镜底人种独立链条进行玩耍运动会的固定学习 Nash Equiquiblium Polication Polication of $n-Play Stopchatic Paltic Stopchatic Police

S. Rasoul Etesami

We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can receive only realizations of their payoffs, not the actual functions, and cannot observe each other's states/actions. Under some assumptions on the structure of the payoff functions, we develop efficient learning algorithms based on dual averaging and dual mirror descent, which provably converge almost surely or in expectation to the set of $\epsilon$-Nash equilibrium policies. In particular, we derive upper bounds on the number of iterates that scale polynomially in terms of the game parameters to achieve an $\epsilon$-Nash equilibrium policy. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that provably admit polynomial-time learning algorithms for finding their $\epsilon$-Nash equilibrium policies.

翻译：我们考虑的是一小类的美元玩家随机游戏,在这种游戏中,玩家拥有自己的内部状态/行动空间,而他们却通过报酬功能相互配合。我们假定玩家的内部链条是由独立的过渡概率驱动的。此外,玩家只能得到报酬的实现,而不是实际功能,不能观察对方的状态/行动。根据对报酬功能结构的一些假设,我们开发了基于双均和双镜下降的高效学习算法,这种算法几乎可以肯定地或预期地会与一套$/epsilon$-Nash平衡政策相融合。特别是,我们从游戏参数的大小上看,在达到美元-纳什平衡政策的游戏参数方面,我们从中得出了比例化的游戏数量。除了马尔科夫潜在游戏和线性夸式随机游戏之外,这项工作提供了另一个小类的美元玩家随机游戏,这些小类的游戏几乎可以肯定地结合或预期到$\epsilon-nash平衡政策。

0

相关内容

相互独立的

相互独立的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

NFATc1通过ATF3增强足细胞损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

大学生非正规就业及其对收入不平等的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-449介导KDM4C-Notch通路在三阴性乳腺癌增殖转移中的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kaplan-Yorke型方程的周期解与次调和解问题

国家自然科学基金

0+阅读 · 2011年12月31日

基于联合决策与估计的高频超视距雷达信息处理与融合

国家自然科学基金

3+阅读 · 2011年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

Equilibria and Convergence in Fire Sale Games

Arxiv

0+阅读 · 2022年6月29日

On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

Arxiv

0+阅读 · 2022年6月28日

Efficient estimation of modified treatment policy effects based on the generalized propensity score

Arxiv

0+阅读 · 2022年6月28日

Discrete Stochastic Optimization for Public Health Interventions with Constraints

Arxiv

0+阅读 · 2022年6月27日

Supply-Side Equilibria in Recommender Systems

Arxiv

0+阅读 · 2022年6月27日

Disinformation, Stochastic Harm, and Costly Effort: A Principal-Agent Analysis of Regulating Social Media Platforms

Arxiv

0+阅读 · 2022年6月27日

Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient

Arxiv

0+阅读 · 2022年6月26日

Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior

Arxiv

0+阅读 · 2022年6月26日

Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting

Arxiv

0+阅读 · 2022年6月25日

Convexity and Duality in Optimum Real-time Bidding and Related Problems

Arxiv

0+阅读 · 2022年6月22日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

92+阅读 · 2020年5月5日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Equilibria and Convergence in Fire Sale Games

Arxiv

0+阅读 · 2022年6月29日

On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

Arxiv

0+阅读 · 2022年6月28日

Efficient estimation of modified treatment policy effects based on the generalized propensity score

Arxiv

0+阅读 · 2022年6月28日

Discrete Stochastic Optimization for Public Health Interventions with Constraints

Arxiv

0+阅读 · 2022年6月27日

Supply-Side Equilibria in Recommender Systems

Arxiv

0+阅读 · 2022年6月27日

Disinformation, Stochastic Harm, and Costly Effort: A Principal-Agent Analysis of Regulating Social Media Platforms

Arxiv

0+阅读 · 2022年6月27日

Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient

Arxiv

0+阅读 · 2022年6月26日

Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior

Arxiv

0+阅读 · 2022年6月26日

Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting

Arxiv

0+阅读 · 2022年6月25日

Convexity and Duality in Optimum Real-time Bidding and Related Problems

Arxiv

0+阅读 · 2022年6月22日

相关基金

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

NFATc1通过ATF3增强足细胞损伤的机制

国家自然科学基金

0+阅读 · 2014年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

大学生非正规就业及其对收入不平等的影响研究

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-449介导KDM4C-Notch通路在三阴性乳腺癌增殖转移中的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kaplan-Yorke型方程的周期解与次调和解问题

国家自然科学基金

0+阅读 · 2011年12月31日

基于联合决策与估计的高频超视距雷达信息处理与融合

国家自然科学基金

3+阅读 · 2011年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员