羽毛雀鸟合而为之:通过多边代理代表关系密切注视合作的兴起 (Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL) - 专知论文

会员服务 ·

0

学成 · TEAM · 评论员 · 强化学习 · 多代理人模型 ·

2021 年 6 月 5 日

Birds of a Feather Flock Together: A Close Look at Cooperation Emergence via Multi-Agent RL

翻译：羽毛雀鸟合而为之:通过多边代理代表关系密切注视合作的兴起

Heng Dong,Tonghan Wang,Jiayuan Liu,Chi Han,Chongjie Zhang

How cooperation emerges is a long-standing and interdisciplinary problem. Game-theoretical studies on social dilemmas reveal that altruistic incentives are critical to the emergence of cooperation but their analyses are limited to stateless games. For more realistic scenarios, multi-agent reinforcement learning has been used to study sequential social dilemmas (SSDs). Recent works show that learning to incentivize other agents can promote cooperation in SSDs. However, we find that, with these incentivizing mechanisms, the team cooperation level does not converge and regularly oscillates between cooperation and defection during learning. We show that a second-order social dilemma resulting from the incentive mechanisms is the main reason for such fragile cooperation. We formally analyze the dynamics of second-order social dilemmas and find that a typical tendency of humans, called homophily, provides a promising solution. We propose a novel learning framework to encourage homophilic incentives and show that it achieves stable cooperation in both SSDs of public goods and tragedy of the commons.

翻译：如何开展合作是一个长期和跨学科的问题。关于社会困境的游戏理论研究显示,利他主义激励机制对于合作的出现至关重要,但其分析仅限于无国籍游戏。对于更现实的情况,多剂强化学习已经用于研究相继的社会困境(SSDS)。最近的工作表明,学习激励其他代理人可以促进SSD的合作。然而,我们发现,有了这些激励机制,团队合作水平在学习期间的合作和叛逃之间并不趋同和经常振荡。我们表明,由激励机制产生的二阶社会困境是这种脆弱合作的主要原因。我们正式分析二阶社会困境的动态,发现典型的人类趋势,即同族主义趋势,提供了一种有希望的解决办法。我们提出了一个新颖的学习框架,鼓励同性哲学激励机制,并表明它在公共商品的SSD和公域的悲剧中都实现了稳定的合作。

0

相关内容

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

专知会员服务

80+阅读 · 2019年11月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Policy Regularization with Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Arxiv

0+阅读 · 2021年8月2日

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

Arxiv

0+阅读 · 2021年8月2日

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Arxiv

0+阅读 · 2021年8月1日

Is Intelligence Artificial?

Is Intelligence Artificial?

Arxiv

0+阅读 · 2021年7月29日

QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning

Arxiv

0+阅读 · 2021年7月29日

Machine Vision for Improved Human-Robot Cooperation in Adverse Underwater Conditions

Arxiv

0+阅读 · 2021年7月29日

Scalable Reinforcement Learning Policies for Multi-Agent Control

Arxiv

0+阅读 · 2021年7月28日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

多代理人模型

相关VIP内容

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

迁移学习简明教程，11页ppt

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

专知会员服务

80+阅读 · 2019年11月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Policy Regularization with Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Arxiv

0+阅读 · 2021年8月2日

On the Impossibility of Convergence of Mixed Strategies with No Regret Learning

Arxiv

0+阅读 · 2021年8月2日

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

Arxiv

0+阅读 · 2021年8月1日

Is Intelligence Artificial?

Is Intelligence Artificial?

Arxiv

0+阅读 · 2021年7月29日

QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning

Arxiv

0+阅读 · 2021年7月29日

Machine Vision for Improved Human-Robot Cooperation in Adverse Underwater Conditions

Arxiv

0+阅读 · 2021年7月29日

Scalable Reinforcement Learning Policies for Multi-Agent Control

Arxiv

0+阅读 · 2021年7月28日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员