Stackelberg POMDP: 经济设计强化学习方法 (Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design) - 专知论文

会员服务 ·

0

Learning · INFORMS · 优化器 · 设计 · 回合 ·

2023 年 2 月 15 日

Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

翻译：Stackelberg POMDP: 经济设计强化学习方法

Gianluca Brero,Alon Eden,Darshan Chakrabarti,Matthias Gerstgrasser,Vincent Li,David C. Parkes

We introduce a reinforcement learning framework for economic design problems. We model the interaction between the designer of the economic environment and the participants as a Stackelberg game: the designer (leader) sets up the rules, and the participants (followers) respond strategically. We model the followers via no-regret dynamics, which converge to a Bayesian Coarse-Correlated Equilibrium (B-CCE) of the game induced by the leader. We embed the followers' no-regret dynamics in the leader's learning environment, which allows us to formulate our learning problem as a POMDP. We call this POMDP the Stackelberg POMDP. We prove that the optimal policy of the Stackelberg POMDP achieves the same utility as the optimal leader's strategy in our Stackelberg game. We solve the Stackelberg POMDP using an actor-critic method, where the critic can access the joint information of all agents. Finally, we show that we are able to learn optimal leader strategies in a variety of settings, including scenarios where the leader is participating in or designing normal-form games, as well as settings with incomplete information that capture common aspects of indirect mechanism design such as limited communication and turn-taking play by agents.

翻译：我们引入了经济设计问题强化学习框架。我们将经济环境设计者和参与者之间的互动模式建模为Stackelberg游戏:设计者(领导者)制定规则,参与者(追随者)做出战略反应。我们通过无雷动态模型构建追随者,这与领导人引领的游戏中的巴伊西亚-科萨-科科科相关电子平衡(B-CCE)相融合。我们将追随者无雷动态嵌入领导者的学习环境中,这使我们能够将我们的学习问题发展成一个POMDP。我们称这个POMDP为Stackelberg POMDP。我们证明,Stackelberg POMDP的最佳政策与我们Stakelberg游戏中的最佳领导者战略具有同样的效用。我们用一个演员-celberg POMDP(B-CE) 方法解决了Stackelberg POMDP(BOMDP), 使评论家能够获取所有代理人的联合信息。最后, 我们表明,我们能够在各种环境中学习最佳的领导策略,包括领导者参与的场景象,或者设计正常设计游戏,作为普通的游戏的场景象,作为不完全的游戏,作为普通的游戏的场景象。

0

相关内容

Learning

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

全球首个GNN为主的AI创业公司，募资$18.5 million！

全球首个GNN为主的AI创业公司，募资$18.5 million！

图与推荐

1+阅读 · 2022年4月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

具有网络结构的演化博弈动力学

国家自然科学基金

0+阅读 · 2013年12月31日

基于构件的软件开发中构件选择与集成优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机利率与随机波动率模型下保险公司最优投资与再保险问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

复杂不确定环境下鲁棒投资组合优化模型及决策研究

国家自然科学基金

4+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于随机图模型的蛋白质三级结构预测算法研究

国家自然科学基金

1+阅读 · 2008年12月31日

Planning for Attacker Entrapment in Adversarial Settings

Arxiv

0+阅读 · 2023年4月5日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

A Model for Multi-Agent Heterogeneous Interaction Problems

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Arxiv

2+阅读 · 2023年4月2日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《面向无人机集群的避障动态传感器覆盖算法》最新38页

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

全球首个GNN为主的AI创业公司，募资$18.5 million！

全球首个GNN为主的AI创业公司，募资$18.5 million！

图与推荐

1+阅读 · 2022年4月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Planning for Attacker Entrapment in Adversarial Settings

Arxiv

0+阅读 · 2023年4月5日

A Policy-Guided Imitation Approach for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年4月5日

A Model for Multi-Agent Heterogeneous Interaction Problems

Arxiv

0+阅读 · 2023年4月4日

Empirical Design in Reinforcement Learning

Arxiv

0+阅读 · 2023年4月3日

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Arxiv

2+阅读 · 2023年4月2日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

具有临界指数的Schrodinger-Poisson系统的解

国家自然科学基金

0+阅读 · 2013年12月31日

具有网络结构的演化博弈动力学

国家自然科学基金

0+阅读 · 2013年12月31日

基于构件的软件开发中构件选择与集成优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机利率与随机波动率模型下保险公司最优投资与再保险问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

复杂不确定环境下鲁棒投资组合优化模型及决策研究

国家自然科学基金

4+阅读 · 2012年12月31日

保险投资组合随机模型中风险控制及优化分红问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于随机图模型的蛋白质三级结构预测算法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员