重审 QMIX:通过逐步入年定期化进行差异性信贷转让 (Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization) - 专知论文

会员服务 ·

0

判别器 · 正则化项 · Performance · Better · state-of-the-art ·

2022 年 2 月 16 日

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

翻译：重审 QMIX:通过逐步入年定期化进行差异性信贷转让

Jian Zhao,Yue Zhang,Xunhan Hu,Weixun Wang,Wengang Zhou,Jianye Hao,Jiangcheng Zhu,Houqiang Li

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards. In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents so as to achieve effective cooperation. Recently, the value decomposition paradigm has been widely adopted to realize credit assignment, and QMIX has become the state-of-the-art solution. In this paper, we revisit QMIX from two aspects. First, we propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents. Second, we propose a gradient entropy regularization with QMIX to realize a discriminative credit assignment, thereby improving the overall performance. The experiments demonstrate that our approach can comparatively improve learning efficiency and achieve better performance.

翻译：在多试剂合作系统中,代理机构共同采取行动并获得团队奖励,而不是个人奖励。在没有个人奖励信号的情况下,通常会引入信用分配机制,以区别不同代理机构的贡献,从而实现有效合作。最近,价值分解范式被广泛采用,以实现信用分配,而QMIX已成为最先进的解决方案。在本文件中,我们从两个方面重新审视QMIX。首先,我们提出了信用分配计量的新观点,从经验上表明,QMIX在向代理机构分配信贷时的不平等性有限。第二,我们建议与QMIX一起实行梯度递增正规化,以实现歧视性信贷分配,从而改善总体绩效。实验表明,我们的方法可以相对提高学习效率,实现更好的业绩。

0

相关内容

判别器

《5G+智慧农业解决方案》22页PPT，三昇农业

《5G+智慧农业解决方案》22页PPT，三昇农业

专知会员服务

56+阅读 · 2022年3月23日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-376b 介导的细胞自噬在羟基酪醇逆转肝癌耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有群作用CR流形上的Morse不等式

国家自然科学基金

0+阅读 · 2015年12月31日

不确定环境下可信国产城轨控制系统（iCMTCt）构造关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

长非编码RNA在Her2阳性乳腺癌中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

腺病毒介导精氨酸脱亚氨基酶靶向性基因治疗肝癌的机制

国家自然科学基金

1+阅读 · 2012年12月31日

ICOS调节Treg增殖及功能机制及其在抗肿瘤免疫治疗中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

HIV的非结构蛋白对ABC转运蛋白作用的研究

国家自然科学基金

0+阅读 · 2011年12月31日

以Her2/neu为靶点的新型VLP疫苗免疫应答研究

国家自然科学基金

0+阅读 · 2011年12月31日

维生素D相关蛋白VDUP1对气道上皮细胞调控哮喘炎症关键分子TSLP的影响

国家自然科学基金

0+阅读 · 2009年12月31日

Understanding and Preventing Capacity Loss in Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images

Arxiv

0+阅读 · 2022年4月19日

INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

Arxiv

0+阅读 · 2022年4月18日

Deep Equilibrium Optical Flow Estimation

Arxiv

0+阅读 · 2022年4月18日

3D-aware Image Synthesis via Learning Structural and Textural Representations

Arxiv

1+阅读 · 2022年4月18日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Minimizing Control for Credit Assignment with Strong Feedback

Arxiv

0+阅读 · 2022年4月14日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

《5G+智慧农业解决方案》22页PPT，三昇农业

《5G+智慧农业解决方案》22页PPT，三昇农业

专知会员服务

56+阅读 · 2022年3月23日

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

兵棋系统文档：联合战区级模拟-全球行动（JTLS-GO®）

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

从无人机到数据：揭示边缘计算作为新作战域

综述：机器嗅觉与嵌入式人工智能正在塑造新的全球传感产业

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

CVPR2019 | 15篇论文速递（涵盖目标检测、语义分割和姿态估计等方向）

AI研习社

15+阅读 · 2019年5月8日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Understanding and Preventing Capacity Loss in Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images

Arxiv

0+阅读 · 2022年4月19日

INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

Arxiv

0+阅读 · 2022年4月18日

Deep Equilibrium Optical Flow Estimation

Arxiv

0+阅读 · 2022年4月18日

3D-aware Image Synthesis via Learning Structural and Textural Representations

Arxiv

1+阅读 · 2022年4月18日

A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow

Arxiv

0+阅读 · 2022年4月15日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Minimizing Control for Credit Assignment with Strong Feedback

Arxiv

0+阅读 · 2022年4月14日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

相关基金

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-376b 介导的细胞自噬在羟基酪醇逆转肝癌耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

具有群作用CR流形上的Morse不等式

国家自然科学基金

0+阅读 · 2015年12月31日

不确定环境下可信国产城轨控制系统（iCMTCt）构造关键技术研究

国家自然科学基金

1+阅读 · 2014年12月31日

长非编码RNA在Her2阳性乳腺癌中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

腺病毒介导精氨酸脱亚氨基酶靶向性基因治疗肝癌的机制

国家自然科学基金

1+阅读 · 2012年12月31日

ICOS调节Treg增殖及功能机制及其在抗肿瘤免疫治疗中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

HIV的非结构蛋白对ABC转运蛋白作用的研究

国家自然科学基金

0+阅读 · 2011年12月31日

以Her2/neu为靶点的新型VLP疫苗免疫应答研究

国家自然科学基金

0+阅读 · 2011年12月31日

维生素D相关蛋白VDUP1对气道上皮细胞调控哮喘炎症关键分子TSLP的影响

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员