在动态定价中为n-Player Markov运动会学习 Nash 平衡 (Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing) - 专知论文

会员服务 ·

0

Markov · 纳什均衡 · Learning · Agent · 近似 ·

2022 年 8 月 25 日

Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing

翻译：在动态定价中为n-Player Markov运动会学习 Nash 平衡

from arxiv, 9 pages main paper, 21 pages including appendix

We investigate Nash equilibrium learning in a competitive Markov Game (MG) environment, where multiple agents compete, and multiple Nash equilibria can exist. In particular, for an oligopolistic dynamic pricing environment, exact Nash equilibria are difficult to obtain due to the curse-of-dimensionality. We develop a new model-free method to find approximate Nash equilibria. Gradient-free black box optimization is then applied to estimate $\epsilon$, the maximum reward advantage of an agent unilaterally deviating from any joint policy, and to also estimate the $\epsilon$-minimizing policy for any given state. The policy-$\epsilon$ correspondence and the state to $\epsilon$-minimizing policy are represented by neural networks, the latter being the Nash Policy Net. During batch update, we perform Nash Q learning on the system, by adjusting the action probabilities using the Nash Policy Net. We demonstrate that an approximate Nash equilibrium can be learned, particularly in the dynamic pricing domain where exact solutions are often intractable.

翻译：我们在竞争激烈的Markov游戏(MG)环境中调查纳什平衡学习,在这个环境中,多个代理商可以相互竞争,多个纳什平衡可以存在。特别是,对于寡头主义动态定价环境,由于维度的诅咒,很难获得准确的纳什平衡。我们开发了一种新的无模式方法来寻找大约纳什平衡。然后,将无梯度黑盒优化用于估算美元,即单方面偏离任何联合政策的代理商的最大奖赏优势,并且还估算任何特定州以美元为最小化的政策。政策-美元-百分率通信和以美元为最小化的政策由神经网络代表,后者是纳什政策网。在批次更新过程中,我们通过调整使用纳什政策网的行动概率,在系统上学习纳什Q。我们证明,可以学习到一种近似纳什平衡,特别是在动态定价领域,确切的解决方案往往难以解决。

0

相关内容

Markov

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

稀土三氢化物高压下的金属-绝缘体相变与超导相变研究

国家自然科学基金

0+阅读 · 2014年12月31日

磁性钴(II)和镍(II)簇合物的组装，溶液性质及形成过程研究

国家自然科学基金

0+阅读 · 2013年12月31日

直径可控一维低熔点金属Sn、Bi、In、Zn纳米线阵列的构建及储锂过程机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型MnBi/α-Fe双相复合纳米晶永磁体的微结构及磁硬化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

非过渡金属杂质、缺陷对拓扑绝缘体材料改性的理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

Cooperative Coverage with a Leader and a Wingmate in Communication-Constrained Environments

Arxiv

0+阅读 · 2022年10月6日

Cost Design in Atomic Routing Games

Arxiv

0+阅读 · 2022年10月3日

ObPose: Leveraging Pose for Object-Centric Scene Inference in 3D

Arxiv

0+阅读 · 2022年10月3日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月3日

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

Saving the Limping: Fault-tolerant Quadruped Locomotion via Reinforcement Learning

Arxiv

0+阅读 · 2022年10月2日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Arxiv

0+阅读 · 2022年9月30日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

VIP会员

文章信息

相关主题

相关VIP内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

手写实现李航《统计学习方法》书中全部算法

专知会员服务

142+阅读 · 2020年5月19日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Cooperative Coverage with a Leader and a Wingmate in Communication-Constrained Environments

Arxiv

0+阅读 · 2022年10月6日

Cost Design in Atomic Routing Games

Arxiv

0+阅读 · 2022年10月3日

ObPose: Leveraging Pose for Object-Centric Scene Inference in 3D

Arxiv

0+阅读 · 2022年10月3日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月3日

Latent State Marginalization as a Low-cost Approach for Improving Exploration

Arxiv

0+阅读 · 2022年10月3日

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年10月3日

Saving the Limping: Fault-tolerant Quadruped Locomotion via Reinforcement Learning

Arxiv

0+阅读 · 2022年10月2日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2022年9月30日

A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

Arxiv

0+阅读 · 2022年9月30日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

相关基金

Bacillus megaterium Q3降解二氯喹啉酸分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

稀土三氢化物高压下的金属-绝缘体相变与超导相变研究

国家自然科学基金

0+阅读 · 2014年12月31日

磁性钴(II)和镍(II)簇合物的组装，溶液性质及形成过程研究

国家自然科学基金

0+阅读 · 2013年12月31日

直径可控一维低熔点金属Sn、Bi、In、Zn纳米线阵列的构建及储锂过程机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

复几何中的对称性及其在数学物理中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

新型MnBi/α-Fe双相复合纳米晶永磁体的微结构及磁硬化机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Numbl-TRAF6-TAB2对NF-kappa B活性的调节在小胶质细胞炎性活化中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

非过渡金属杂质、缺陷对拓扑绝缘体材料改性的理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电配合物的合成，结构与性质研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员