普通和低级Markov运动会学习 (Representation Learning for General-sum Low-rank Markov Games) - 专知论文

会员服务 ·

0

Markov · 样本复杂度 · 表示 · Learning · 表示学习 ·

2022 年 10 月 30 日

Representation Learning for General-sum Low-rank Markov Games

翻译：普通和低级Markov运动会学习

Chengzhuo Ni,Yuda Song,Xuezhou Zhang,Chi Jin,Mengdi Wang

We study multi-agent general-sum Markov games with nonlinear function approximation. We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation. The goal is to design an algorithm that (1) finds an $\varepsilon$-equilibrium policy sample efficiently without prior knowledge of the environment or the representation, and (2) permits a deep-learning friendly implementation. We leverage representation learning and present a model-based and a model-free approach to construct an effective representation from the collected data. For both approaches, the algorithm achieves a sample complexity of poly$(H,d,A,1/\varepsilon)$, where $H$ is the game horizon, $d$ is the dimension of the feature vector, $A$ is the size of the joint action space and $\varepsilon$ is the optimality gap. When the number of players is large, the above sample complexity can scale exponentially with the number of players in the worst case. To address this challenge, we consider Markov games with a factorized transition structure and present an algorithm that escapes such exponential scaling. To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation. We accompany our theoretical result with a neural network-based implementation of our algorithm and evaluate it against the widely used deep RL baseline, DQN with fictitious play.

翻译：我们用非线性函数近似法研究多试剂通用和马尔科夫游戏。我们侧重于低级马科夫游戏,其过渡矩阵在未知的非线性代表制之上承认隐藏的低级别结构。目标是设计一种算法:(1) 在没有事先对环境或代表制的了解的情况下,找到一个美元-平衡的美元政策样本,这种算法是有效的,没有事先对环境或代表制的了解,而(2) 允许一个深层次的友好执行。我们利用代表性学习,提出一种基于模型和无模式的方法,以从所收集的数据中建立有效的代表制。对于这两种方法,算法都具有多边(H,d,A,1/ varepsilon)的样本复杂性。对于这两种方法, 算法都具有多边(H,d,A,1/\ varepslon) 的样本复杂性。美元是功能的维度矢值, $A是联合行动空间的大小, 美元是最佳性执行。当玩家人数众多时, 以上基于以最差的游戏复杂性可以与最差的玩家数量成指数。为了应对这一挑战, 我们认为Markov游戏, 与第一个因素的过渡结构游戏, 我们最接近性的游戏的游戏, 并呈现一个比重的游戏的游戏的游戏, 我们的游戏的游戏的基底基数级的算法。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

矩阵低秩稀疏分解的两步凸松弛法研究

国家自然科学基金

2+阅读 · 2015年12月31日

缝洞储集体双侧向测井响应的物理模拟及特征分析

国家自然科学基金

0+阅读 · 2013年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

非磁性元素掺杂稀磁半导体铁磁性机理研究的新方法

国家自然科学基金

0+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

贵金属硫化物低电子态激光诱导荧光光谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing

Arxiv

0+阅读 · 2022年12月19日

Robustness and sample complexity of model-based MARL for general-sum Markov games

Arxiv

0+阅读 · 2022年12月19日

Near-optimal Policy Identification in Active Reinforcement Learning

Arxiv

0+阅读 · 2022年12月19日

On the Complexity of Representation Learning in Contextual Linear Bandits

Arxiv

0+阅读 · 2022年12月19日

Quantum policy gradient algorithms

Arxiv

0+阅读 · 2022年12月19日

Latent Variable Representation for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月17日

Numerical Optimizations for Weighted Low-rank Estimation on Language Model

Arxiv

0+阅读 · 2022年12月15日

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Arxiv

0+阅读 · 2022年12月15日

Approximation bounds for convolutional neural networks in operator learning

Arxiv

0+阅读 · 2022年12月15日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing

Arxiv

0+阅读 · 2022年12月19日

Robustness and sample complexity of model-based MARL for general-sum Markov games

Arxiv

0+阅读 · 2022年12月19日

Near-optimal Policy Identification in Active Reinforcement Learning

Arxiv

0+阅读 · 2022年12月19日

On the Complexity of Representation Learning in Contextual Linear Bandits

Arxiv

0+阅读 · 2022年12月19日

Quantum policy gradient algorithms

Arxiv

0+阅读 · 2022年12月19日

Latent Variable Representation for Reinforcement Learning

Arxiv

0+阅读 · 2022年12月17日

Numerical Optimizations for Weighted Low-rank Estimation on Language Model

Arxiv

0+阅读 · 2022年12月15日

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

Arxiv

0+阅读 · 2022年12月15日

Approximation bounds for convolutional neural networks in operator learning

Arxiv

0+阅读 · 2022年12月15日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

相关基金

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

矩阵低秩稀疏分解的两步凸松弛法研究

国家自然科学基金

2+阅读 · 2015年12月31日

缝洞储集体双侧向测井响应的物理模拟及特征分析

国家自然科学基金

0+阅读 · 2013年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

有理动力系统中的拓扑和拟共形几何

国家自然科学基金

1+阅读 · 2012年12月31日

非磁性元素掺杂稀磁半导体铁磁性机理研究的新方法

国家自然科学基金

0+阅读 · 2012年12月31日

非对称矩阵优化问题的灵敏度分析、算法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

贵金属硫化物低电子态激光诱导荧光光谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

超大规模集成电路布局的ell-1模优化模型及其算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员