多机构强化学习中最佳协调的基于贪婪的价值代表 (Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning) - 专知论文

会员服务 ·

0

优化器 · Learning · 价值函数 · 结点 · 强化学习 ·

2022 年 7 月 4 日

Greedy-based Value Representation for Optimal Coordination in Multi-agent Reinforcement Learning

翻译：多机构强化学习中最佳协调的基于贪婪的价值代表

Lipeng Wan,Zeyang Liu,Xingyu Chen,Han Wang,Xuguang Lan

Due to the representation limitation of the joint Q value function, multi-agent reinforcement learning methods with linear value decomposition (LVD) or monotonic value decomposition (MVD) suffer from relative overgeneralization. As a result, they can not ensure optimal consistency (i.e., the correspondence between individual greedy actions and the maximal true Q value). In this paper, we derive the expression of the joint Q value function of LVD and MVD. According to the expression, we draw a transition diagram, where each self-transition node (STN) is a possible convergence. To ensure optimal consistency, the optimal node is required to be the unique STN. Therefore, we propose the greedy-based value representation (GVR), which turns the optimal node into an STN via inferior target shaping and further eliminates the non-optimal STNs via superior experience replay. In addition, GVR achieves an adaptive trade-off between optimality and stability. Our method outperforms state-of-the-art baselines in experiments on various benchmarks. Theoretical proofs and empirical results on matrix games demonstrate that GVR ensures optimal consistency under sufficient exploration.

翻译：由于联合Q值功能的表示限制,具有线性分解(LVD)或单体值分解(MVD)的多试剂强化学习方法相对过于笼统,因此无法确保最佳一致性(即个人贪婪行动与最大真实Q值之间的对应关系),在本文件中,我们得出LVD和MVD联合Q值函数的表示方式。根据表达方式,我们绘制一个过渡图,其中每个自转节点都是可能的汇合点。为了确保最佳一致性,需要最佳节点才能成为独特的STN。因此,我们提议基于贪婪的表示方式(GVR),它将最佳节点变成STN,通过低级目标制成,并通过高级经验再演进一步消除非最佳的STN。此外,GVR在最佳性和稳定性之间实现了适应性交易。我们的方法在各种基准的实验中超越了最先进的基准。关于GVR的探索的理论证据和实验结果显示,GVR有足够的最佳一致性。

0

相关内容

优化器

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

BNT基低电场高电致应变三元无铅压电陶瓷的结构调控及其机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

表面等离激元增强AlGaN基深紫外LED研究

国家自然科学基金

0+阅读 · 2012年12月31日

CRMP4转录调控与前列腺癌转移机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于锂离子电池的新型离子液体电解液的设计与制备基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

高电致应变BNT基三元复合无铅压电陶瓷的相变行为与容忍因子关系的研究

国家自然科学基金

0+阅读 · 2012年12月31日

LNK基因影响JAK-STAT信号通路导致骨髓增殖性肿瘤发生的机理

国家自然科学基金

0+阅读 · 2012年12月31日

一种尚未报导的高频率重组现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

储电型柔性有机薄膜太阳电池的基础研究

国家自然科学基金

0+阅读 · 2008年12月31日

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Arxiv

0+阅读 · 2022年8月24日

Entropy Enhanced Multi-Agent Coordination Based on Hierarchical Graph Learning for Continuous Action Space

Arxiv

0+阅读 · 2022年8月23日

Quantum Multi-Agent Meta Reinforcement Learning

Arxiv

1+阅读 · 2022年8月22日

A simple learning agent interacting with an agent-based market model

Arxiv

0+阅读 · 2022年8月22日

Learning Ball-balancing Robot Through Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月22日

Adjacency constraint for efficient hierarchical reinforcement learning

Arxiv

0+阅读 · 2022年8月22日

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

Arxiv

0+阅读 · 2022年8月19日

Spectral Decomposition Representation for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月19日

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Arxiv

0+阅读 · 2022年8月19日

Entropy Augmented Reinforcement Learning

Arxiv

0+阅读 · 2022年8月19日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Arxiv

0+阅读 · 2022年8月24日

Entropy Enhanced Multi-Agent Coordination Based on Hierarchical Graph Learning for Continuous Action Space

Arxiv

0+阅读 · 2022年8月23日

Quantum Multi-Agent Meta Reinforcement Learning

Arxiv

1+阅读 · 2022年8月22日

A simple learning agent interacting with an agent-based market model

Arxiv

0+阅读 · 2022年8月22日

Learning Ball-balancing Robot Through Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月22日

Adjacency constraint for efficient hierarchical reinforcement learning

Arxiv

0+阅读 · 2022年8月22日

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

Arxiv

0+阅读 · 2022年8月19日

Spectral Decomposition Representation for Reinforcement Learning

Arxiv

0+阅读 · 2022年8月19日

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Arxiv

0+阅读 · 2022年8月19日

Entropy Augmented Reinforcement Learning

Arxiv

0+阅读 · 2022年8月19日

相关基金

BNT基低电场高电致应变三元无铅压电陶瓷的结构调控及其机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

表面等离激元增强AlGaN基深紫外LED研究

国家自然科学基金

0+阅读 · 2012年12月31日

CRMP4转录调控与前列腺癌转移机理的研究

国家自然科学基金

0+阅读 · 2012年12月31日

用于锂离子电池的新型离子液体电解液的设计与制备基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

高电致应变BNT基三元复合无铅压电陶瓷的相变行为与容忍因子关系的研究

国家自然科学基金

0+阅读 · 2012年12月31日

LNK基因影响JAK-STAT信号通路导致骨髓增殖性肿瘤发生的机理

国家自然科学基金

0+阅读 · 2012年12月31日

一种尚未报导的高频率重组现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

储电型柔性有机薄膜太阳电池的基础研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员