TransfQMix:利用多机构强化学习问题图表结构的变换器 (TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems) - 专知论文

会员服务 ·

0

Learning · Agent · 变换 · 图 · 相互独立的 ·

2023 年 1 月 13 日

TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems

翻译：TransfQMix:利用多机构强化学习问题图表结构的变换器

Matteo Gallici,Mario Martin,Ivan Masmitja

from arxiv, Accepted at AAMAS 2023. Code at https://github.com/mttga/pymarl_transformers

Coordination is one of the most difficult aspects of multi-agent reinforcement learning (MARL). One reason is that agents normally choose their actions independently of one another. In order to see coordination strategies emerging from the combination of independent policies, the recent research has focused on the use of a centralized function (CF) that learns each agent's contribution to the team reward. However, the structure in which the environment is presented to the agents and to the CF is typically overlooked. We have observed that the features used to describe the coordination problem can be represented as vertex features of a latent graph structure. Here, we present TransfQMix, a new approach that uses transformers to leverage this latent structure and learn better coordination policies. Our transformer agents perform a graph reasoning over the state of the observable entities. Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents. TransfQMix is designed to be entirely transferable, meaning that same parameters can be used to control and train larger or smaller teams of agents. This enables to deploy promising approaches to save training time and derive general policies in MARL, such as transfer learning, zero-shot transfer, and curriculum learning. We report TransfQMix's performances in the Spread and StarCraft II environments. In both settings, it outperforms state-of-the-art Q-Learning models, and it demonstrates effectiveness in solving problems that other methods can not solve.

翻译：协调是多试剂强化学习(MARL)最困难的方面之一。一个原因是代理商通常各自独立地选择行动。为了看到独立政策的组合所产生的协调战略,最近的研究侧重于使用中央功能(CF),了解每个代理商对团队奖赏的贡献。然而,向代理商和CF展示环境的结构通常被忽视。我们观察到,描述协调问题所使用的特征可以作为潜在图形结构的顶点特征。在这里,我们介绍 TransfQMix,一种使用变异器利用这一潜在结构并学习更好的协调政策的新方法。我们的变异器代理商对可观测实体的状况进行图表推理。我们的变异器Q-Mix从包含代理商内部和外部状态的更大图表中学习一个单调混合功能。 TransfQMix的设计是完全可转让的,这意味着可以使用相同的参数来控制并培训规模或较小的代理商团队。这样可以部署有希望的方法来节省培训时间,并在MAR-C的设置中制定一般政策。我们变异模型,在MAR-Q中,我们学习S-Tradingstal-Trading Stal-modrof lish

0

相关内容

Learning

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Cklf1促血管平滑肌细胞增殖和迁移过程中信号传导机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

路易斯酸协同催化生物质氧化制甲酸研究

国家自然科学基金

0+阅读 · 2012年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质疏水电荷诱导色谱置换剂的多尺度分子设计

国家自然科学基金

0+阅读 · 2009年12月31日

靶向CEA阳性胰腺癌的治疗性免疫细胞疫苗的研制及实验

国家自然科学基金

0+阅读 · 2008年12月31日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年3月9日

Beware of Instantaneous Dependence in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

Arxiv

0+阅读 · 2023年3月9日

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年3月9日

Beware of Instantaneous Dependence in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Intent-based Deep Reinforcement Learning for Multi-agent Informative Path Planning

Arxiv

0+阅读 · 2023年3月9日

Reward Informed Dreamer for Task Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月9日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Cklf1促血管平滑肌细胞增殖和迁移过程中信号传导机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

路易斯酸协同催化生物质氧化制甲酸研究

国家自然科学基金

0+阅读 · 2012年12月31日

以离子液体为溶剂的纤维素/蛋白质共溶解与纺丝成形机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

蛋白质疏水电荷诱导色谱置换剂的多尺度分子设计

国家自然科学基金

0+阅读 · 2009年12月31日

靶向CEA阳性胰腺癌的治疗性免疫细胞疫苗的研制及实验

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员