使用函数逼近求解大规模完全信息博弈的斯塔克尔贝格均衡问题 (Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games) - 专知论文

会员服务 ·

0

博弈 · 函数逼近 · 广义 · 均衡 · 状态值函数 ·

2023 年 4 月 2 日

Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

翻译：使用函数逼近求解大规模完全信息博弈的斯塔克尔贝格均衡问题

Chun Kai Ling,J. Zico Kolter,Fei Fang

from arxiv, To appear in AAAI 2023

Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.

翻译：函数逼近（FA）一直是解决大型零和博弈的关键组成部分。然而，尽管广泛被认为比其全面竞争或合作的对应物更具计算上的挑战性，但很少关注在使用 FA 解决广义和博弈（general-sum games）中。一个关键的挑战是对于许多广义和博弈中的均衡，不存在类似于马尔可夫决策过程和零和博弈中使用的状态值函数的简单类比。在本文中，我们提出使用神经网络学习“可执行收益（Enforceable Payoff）边界”（EPF），它是广义和博弈对状态值函数的一种泛化。我们通过以合适的备份操作和损失函数来表示 EPFs 并对其进行训练，从而逼近最优斯塔克尔贝格广义博弈均衡。这是第一种将 FA 应用于斯塔克尔伯格设置的方法，使我们能够将规模扩大到更大的比赛，同时仍然享有基于 FA 错误的性能保证。此外，我们提出的方法保证激励兼容性，并且易于评估，无需依赖自我游戏或近似最佳响应预言机。

0

相关内容

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

专知会员服务

26+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

专知会员服务

81+阅读 · 2020年7月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

17+阅读 · 2017年11月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

近临界随机环境中随机游动的若干极限性质

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料里电磁问题的有限元方法

国家自然科学基金

1+阅读 · 2015年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

基于竞争差分析的单向交易策略

国家自然科学基金

0+阅读 · 2014年12月31日

约束集值优化问题的适定性研究及相关分析

国家自然科学基金

0+阅读 · 2013年12月31日

一簇非线性混杂系统辨识与控制的优化理论与并行算法

国家自然科学基金

0+阅读 · 2011年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

Memory Asymmetry: A Key to Convergence in Zero-Sum Games

Arxiv

0+阅读 · 2023年5月23日

Equilibrium and Learning in Fixed-Price Data Markets with Externality

Arxiv

0+阅读 · 2023年5月22日

Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

Arxiv

0+阅读 · 2023年5月22日

Personalized incentives as feedback design in generalized Nash equilibrium problems

Arxiv

0+阅读 · 2023年5月22日

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

Arxiv

0+阅读 · 2023年5月22日

Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games

Arxiv

0+阅读 · 2023年5月22日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月21日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2023年5月20日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Arxiv

15+阅读 · 2021年2月9日

VIP会员

文章信息

相关主题

状态值函数

相关VIP内容

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

牛津大学《多智能体影响图的均衡优化: 理论和实践》，Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

专知会员服务

26+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

[ICLR2022]PU learning（Positive and Unlabeled learning）任务的mixup方法

专知会员服务

19+阅读 · 2022年2月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

专知会员服务

81+阅读 · 2020年7月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

17+阅读 · 2017年11月9日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Memory Asymmetry: A Key to Convergence in Zero-Sum Games

Arxiv

0+阅读 · 2023年5月23日

Equilibrium and Learning in Fixed-Price Data Markets with Externality

Arxiv

0+阅读 · 2023年5月22日

Learning in Multi-Memory Games Triggers Complex Dynamics Diverging from Nash Equilibrium

Arxiv

0+阅读 · 2023年5月22日

Personalized incentives as feedback design in generalized Nash equilibrium problems

Arxiv

0+阅读 · 2023年5月22日

Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare

Arxiv

0+阅读 · 2023年5月22日

Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games

Arxiv

0+阅读 · 2023年5月22日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月21日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2023年5月20日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice

Arxiv

15+阅读 · 2021年2月9日

相关基金

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

近临界随机环境中随机游动的若干极限性质

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料里电磁问题的有限元方法

国家自然科学基金

1+阅读 · 2015年12月31日

集值优化问题的逼近解及二阶最优性条件

国家自然科学基金

0+阅读 · 2014年12月31日

基于竞争差分析的单向交易策略

国家自然科学基金

0+阅读 · 2014年12月31日

约束集值优化问题的适定性研究及相关分析

国家自然科学基金

0+阅读 · 2013年12月31日

一簇非线性混杂系统辨识与控制的优化理论与并行算法

国家自然科学基金

0+阅读 · 2011年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员