在双层零线运动会中,与全信息反馈和噪音信息反馈相融合 (Last-Iterate Convergence with Full- and Noisy-Information Feedback in Two-Player Zero-Sum Games) - 专知论文

会员服务 ·

0

纳什均衡 · Learning · Weight · 确切的 · motivation ·

2022 年 8 月 21 日

Last-Iterate Convergence with Full- and Noisy-Information Feedback in Two-Player Zero-Sum Games

翻译：在双层零线运动会中,与全信息反馈和噪音信息反馈相融合

Kenshi Abe,Kaito Ariu,Mitsuki Sakamoto,Kentaro Toyoshima,Atsushi Iwasaki

The theory of learning in games is prominent in the AI community, motivated by several rising applications such as multi-agent reinforcement learning and Generative Adversarial Networks. We propose Mutation-driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and prove that it exhibits the last-iterate convergence property in both full- and noisy-information feedback settings. In the full-information feedback setting, the players observe their exact gradient vectors of the utility functions. On the other hand, in the noisy-information feedback setting, they can only observe the noisy gradient vectors. Existing algorithms, including the well-known Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms, fail to converge to a Nash equilibrium with noisy-information feedback. In contrast, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both of the feedback settings. We then prove that it converges to an exact Nash equilibrium by adapting the mutation term iteratively. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.

翻译：在AI社区中,游戏学习的理论在AI社区中十分突出,其动机是多个不断上升的应用,如多试剂强化学习和基因反向网络等。我们提议采用由 Mudiation 驱动的多复制光速更新(M2WU),以在双玩者零和正态游戏中学习平衡,并证明它在完整和吵闹的信息反馈设置中都表现出最后的地缘趋同属性。在全信息反馈设置中,玩家们观察着其实用功能的精确梯度矢量。另一方面,在噪音信息反馈设置中,他们只能观察噪音的梯度矢量。现有的算法,包括众所周知的多复制光速更新(MWU)和优化的MWU(OMWU)算法,未能以噪音信息反馈的方式达到纳什平衡。相比之下,M2W(MW)在回馈设置中都展示了最后的地缘趋同到接近纳什平衡的定点。我们随后证明,通过对突变的术语进行反复调整,它们会达到准确的纳什平衡。我们从经验上确认MW2U的趋同率率。

0

相关内容

纳什均衡

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于结构光方法的全向视觉测量与控制研究

国家自然科学基金

1+阅读 · 2014年12月31日

无线双向信道下的物理层安全理论与传输方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于梁单元形函数时变性的高铁车桥耦合系统地震响应研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于动态似稳磁场模型的内外壁缺陷漏磁信号特征分析与提取

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于X射线脉冲双星的深空航天器高动态大范围自主导航方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯高频介电行为与调制方法

国家自然科学基金

0+阅读 · 2012年12月31日

含2-氨基嘧啶π-共轭聚合物的合成及其光、电性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于传感网络的多移动智能体系统协调控制研究

国家自然科学基金

5+阅读 · 2010年12月31日

Non-Convergence and Limit Cycles in the Adam optimizer

Arxiv

0+阅读 · 2022年10月5日

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Arxiv

0+阅读 · 2022年10月4日

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Arxiv

0+阅读 · 2022年10月4日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月3日

Stability Analysis and Generalization Bounds of Adversarial Training

Arxiv

0+阅读 · 2022年10月3日

Emergent Communication: Generalization and Overfitting in Lewis Games

Arxiv

0+阅读 · 2022年9月30日

ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

Arxiv

0+阅读 · 2022年9月30日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Arxiv

0+阅读 · 2022年9月30日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Non-Convergence and Limit Cycles in the Adam optimizer

Arxiv

0+阅读 · 2022年10月5日

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Arxiv

0+阅读 · 2022年10月4日

Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality

Arxiv

0+阅读 · 2022年10月4日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月3日

Stability Analysis and Generalization Bounds of Adversarial Training

Arxiv

0+阅读 · 2022年10月3日

Emergent Communication: Generalization and Overfitting in Lewis Games

Arxiv

0+阅读 · 2022年9月30日

ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

Arxiv

0+阅读 · 2022年9月30日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Arxiv

0+阅读 · 2022年9月30日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

相关基金

基于结构光方法的全向视觉测量与控制研究

国家自然科学基金

1+阅读 · 2014年12月31日

无线双向信道下的物理层安全理论与传输方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于梁单元形函数时变性的高铁车桥耦合系统地震响应研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于动态似稳磁场模型的内外壁缺陷漏磁信号特征分析与提取

国家自然科学基金

0+阅读 · 2013年12月31日

钙钛矿结构Cr基氧化物单晶的制备和磁电效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于X射线脉冲双星的深空航天器高动态大范围自主导航方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯高频介电行为与调制方法

国家自然科学基金

0+阅读 · 2012年12月31日

含2-氨基嘧啶π-共轭聚合物的合成及其光、电性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于传感网络的多移动智能体系统协调控制研究

国家自然科学基金

5+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员