中战地运动会中学习平衡:采用中战地PSRO (Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO) - 专知论文

会员服务 ·

0

Learning · 相关系数 · Agent · 赌博机/老虎机 · 稳健性 ·

2022 年 8 月 29 日

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO

翻译：中战地运动会中学习平衡:采用中战地PSRO

Paul Muller,Mark Rowland,Romuald Elie,Georgios Piliouras,Julien Perolat,Mathieu Lauriere,Raphael Marinier,Olivier Pietquin,Karl Tuyls

from arxiv, AAMAS

Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.

翻译：在多试剂学习方面,最近出现了一系列的算法,这些算法围绕着以人口为基础的培训方法PSRO, 显示了与Nash、相关和粗粗的corrate晚期平衡的趋同。值得注意的是,当代理商的数量增加时,学习最佳的回答就变得极为困难,并因此成为 ham-pers PSRO 培训方法。当认为游戏是匿名对称游戏时,中场游戏的范式就是一种无症状的解决问题。不幸的是,中场近距离近距离教育非线性使得PSRO无法直接适应。在优化和对抗性遗憾最小化的基础上,本文绕过这个问题,并引入了中场PSRO的加固性PSRO,这是PSRO在中学习纳什、粗度相关和关联的等离差调方法。关键是要用新定义的中场无对抗性对立性学习者,或者黑盒式调整来取代精确的分布步骤。我们用优化和对准度来建立最直接的PSRO。我们用最弱的硬度和最强的硬的游戏来比较它的速度和最稳的硬的硬的硬的硬的硬度,然后用机级的硬的硬的硬的硬度的硬度方法来展示。

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

a-synuclein DNA甲基化在帕金森病发病机制中的作用及肉苁蓉总苷干预实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

新疆陆地棉产量及品质性状与SSR标记的关联分析

国家自然科学基金

0+阅读 · 2012年12月31日

加工番茄可溶性固形物含量的全基因组关联分析与连锁作图

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

SLC22A3-Histamin-LDL途径介导冠心病的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

载BDNF/NEP1～40基因微泡超声介导靶向治疗猕猴脊髓损伤

国家自然科学基金

0+阅读 · 2009年12月31日

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Arxiv

0+阅读 · 2022年10月17日

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Arxiv

0+阅读 · 2022年10月17日

Near-Optimal No-Regret Learning Dynamics for General Convex Games

Arxiv

0+阅读 · 2022年10月16日

Nash Equilibria for Exchangeable Team against Team Games and their Mean Field Limit

Arxiv

0+阅读 · 2022年10月13日

Continual Learning In Environments With Polynomial Mixing Times

Continual Learning In Environments With Polynomial Mixing Times

Arxiv

0+阅读 · 2022年10月13日

Variance-Aware Estimation of Kernel Mean Embedding

Arxiv

0+阅读 · 2022年10月13日

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Arxiv

0+阅读 · 2022年10月12日

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence

Arxiv

0+阅读 · 2022年10月12日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Arxiv

0+阅读 · 2022年10月17日

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Arxiv

0+阅读 · 2022年10月17日

Near-Optimal No-Regret Learning Dynamics for General Convex Games

Arxiv

0+阅读 · 2022年10月16日

Nash Equilibria for Exchangeable Team against Team Games and their Mean Field Limit

Arxiv

0+阅读 · 2022年10月13日

Continual Learning In Environments With Polynomial Mixing Times

Continual Learning In Environments With Polynomial Mixing Times

Arxiv

0+阅读 · 2022年10月13日

Variance-Aware Estimation of Kernel Mean Embedding

Arxiv

0+阅读 · 2022年10月13日

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Arxiv

0+阅读 · 2022年10月12日

Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence

Arxiv

0+阅读 · 2022年10月12日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

TV-miR-200b/c靶向抑制HER2/HER3克服乳腺癌对赫赛汀耐药

国家自然科学基金

0+阅读 · 2014年12月31日

a-synuclein DNA甲基化在帕金森病发病机制中的作用及肉苁蓉总苷干预实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

新疆陆地棉产量及品质性状与SSR标记的关联分析

国家自然科学基金

0+阅读 · 2012年12月31日

加工番茄可溶性固形物含量的全基因组关联分析与连锁作图

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

以EGFR为识别靶位多靶点联合克服NSCLC EGFR TKIs耐药的基因干预研究

国家自然科学基金

0+阅读 · 2011年12月31日

SLC22A3-Histamin-LDL途径介导冠心病的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

多自由度哈密顿系统的动力学不稳定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

载BDNF/NEP1～40基因微泡超声介导靶向治疗猕猴脊髓损伤

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员