关于零和零和沙尘运动会中独立学习动态的异质性 (On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games) - 专知论文

会员服务 ·

0

相互独立的 · 几乎必然收敛 · 衰减系数 · 学成 · 几乎必然 ·

2021 年 12 月 12 日

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

翻译：关于零和零和沙尘运动会中独立学习动态的异质性

Muhammed O. Sayin,K. Alperen Cetiner

from arxiv, Extended version

We analyze the convergence properties of the two-timescale fictitious play combining the classical fictitious play with the Q-learning for two-player zero-sum stochastic games with player-dependent learning rates. We show its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes. To this end, we formulate a novel Lyapunov function formulation and present a one-sided asynchronous convergence result.

翻译：我们分析了将经典假剧与Q-学习相结合的两玩零和零和随机游戏与以玩家为依存学习率的玩家双玩游戏的双重规模虚构游戏的趋同性。当贴现系数低于玩家依存步脚大小比率的产物时,我们几乎可以肯定地显示它与标准假设的双重规模随机切换近似方法的趋同性。为此,我们制作了一部新型的Lyapunov函数配制,并提出了片面的非同步趋同性结果。

0

相关内容

相互独立的

相互独立的

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

车联网白皮书，44页pdf

车联网白皮书，44页pdf

专知会员服务

80+阅读 · 2022年1月3日

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Arxiv

0+阅读 · 2022年2月15日

The Impact of Batch Learning in Stochastic Linear Bandits

Arxiv

0+阅读 · 2022年2月14日

A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization

Arxiv

0+阅读 · 2022年2月13日

Stochastic Client Selection for Federated Learning with Volatile Clients

Arxiv

0+阅读 · 2022年2月12日

Adaptive Regret for Control of Time-Varying Dynamics

Arxiv

0+阅读 · 2022年2月12日

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

Arxiv

0+阅读 · 2022年2月11日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年2月11日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Interference and Generalization in Temporal Difference Learning

Arxiv

8+阅读 · 2020年3月13日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

相互独立的

几乎必然收敛

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

车联网白皮书，44页pdf

车联网白皮书，44页pdf

专知会员服务

80+阅读 · 2022年1月3日

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Arxiv

0+阅读 · 2022年2月15日

The Impact of Batch Learning in Stochastic Linear Bandits

Arxiv

0+阅读 · 2022年2月14日

A Projection-free Algorithm for Constrained Stochastic Multi-level Composition Optimization

Arxiv

0+阅读 · 2022年2月13日

Stochastic Client Selection for Federated Learning with Volatile Clients

Arxiv

0+阅读 · 2022年2月12日

Adaptive Regret for Control of Time-Varying Dynamics

Arxiv

0+阅读 · 2022年2月12日

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

Arxiv

0+阅读 · 2022年2月11日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年2月11日

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Arxiv

4+阅读 · 2020年6月20日

Interference and Generalization in Temporal Difference Learning

Arxiv

8+阅读 · 2020年3月13日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员