高效Q 游戏游戏游戏学习 (Efficient-Q Learning for Stochastic Games) - 专知论文

会员服务 ·

0

估计/估计量 · 近似误差 · Learning · 近似 · 平稳的 ·

2023 年 2 月 20 日

Efficient-Q Learning for Stochastic Games

翻译：高效Q 游戏游戏游戏学习

Muhammed O. Sayin,Onur Unlu

We present the new efficient-Q learning dynamics for stochastic games beyond the recent concentration of progress on provable convergence to possibly inefficient equilibrium. We let agents follow the log-linear learning dynamics in stage games whose payoffs are the Q-functions and estimate the Q-functions iteratively with a vanishing stepsize. This (implicitly) two-timescale dynamic makes stage games relatively stationary for the log-linear update so that the agents can track the efficient equilibrium of stage games. We show that the Q-function estimates converge to the Q-function associated with the efficient equilibrium in identical-interest stochastic games, almost surely, with an approximation error induced by the softmax response in the log-linear update. The key idea is to approximate the dynamics with a fictional scenario where Q-function estimates are stationary over finite-length epochs. We then couple the dynamics in the main and fictional scenarios to show that the approximation error decays to zero due to the vanishing stepsize.

翻译：我们展示了新的高效- Q 游戏的学习动态, 超越最近进展集中在可变趋同到可能效率低下的平衡上。我们让代理商在以Q函数为回报的舞台游戏中跟踪日志- 线性学习动态, 并反复估算Q函数, 并进行递减步骤的迭接。这个( 隐含的) 两年级动态使舞台游戏相对固定, 以便代理商能够跟踪阶段性游戏的有效平衡。我们显示, Q函数估计会与相同利益类同的游戏中与有效平衡相关的Q函数相趋一致, 几乎可以肯定, 由日志- 线性更新中的软负反应引发的近似错误。关键的想法是用虚构的假想来近似该动态, Q函数估计是固定在有限时间范围内的。然后将主要和虚构性情景中的动态进行对比, 以显示近似误衰减为零, 原因是渐渐渐的步化。

0

相关内容

估计/估计量

估计/估计量

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

高效高选择性的动态超分子纳米容器绿色催化剂的开发

国家自然科学基金

0+阅读 · 2014年12月31日

HLA区基因多态性与帕金森病的关联分析及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

动态环境下自适应免疫优化理论与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高固溶度Mg-RE二元合金塑性变形机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

Arxiv

0+阅读 · 2023年4月11日

A Family of Iteration Functions for General Linear Systems

Arxiv

0+阅读 · 2023年4月11日

On anti-stochastic properties of unlabeled graphs

Arxiv

0+阅读 · 2023年4月8日

Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity

Arxiv

0+阅读 · 2023年4月7日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Collective Intelligence for Deep Learning: A Survey of Recent Developments

Arxiv

22+阅读 · 2021年12月22日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

Arxiv

0+阅读 · 2023年4月11日

A Family of Iteration Functions for General Linear Systems

Arxiv

0+阅读 · 2023年4月11日

On anti-stochastic properties of unlabeled graphs

Arxiv

0+阅读 · 2023年4月8日

Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity

Arxiv

0+阅读 · 2023年4月7日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Collective Intelligence for Deep Learning: A Survey of Recent Developments

Arxiv

22+阅读 · 2021年12月22日

Multi-task Learning of Order-Consistent Causal Graphs

Arxiv

10+阅读 · 2021年11月3日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Distributed Machine Learning on Mobile Devices: A Survey

Distributed Machine Learning on Mobile Devices: A Survey

Arxiv

37+阅读 · 2019年9月18日

相关基金

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

高效高选择性的动态超分子纳米容器绿色催化剂的开发

国家自然科学基金

0+阅读 · 2014年12月31日

HLA区基因多态性与帕金森病的关联分析及其分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

动态环境下自适应免疫优化理论与算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

高固溶度Mg-RE二元合金塑性变形机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

CIB1对脑缺血半暗带微血管作用机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员