MDP 多种代理 MDP 单形网络 (Multi-Agent MDP Homomorphic Networks)

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a permutation of the optimal joint policy. Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting, because such approaches rely on the global symmetry in the full state-action spaces, and these can result in correspondences across agents. To encode such symmetries while still allowing distributed execution we propose a factorization that decomposes global symmetries into local transformations. Our proposed factorization allows for distributing the computation that enforces global symmetries over local agents and local interactions. We introduce a multi-agent equivariant policy network based on this factorization. We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines.

翻译：本文介绍多位代理商 MDP 单剂强化学习中的现有对称性工作,只能推广到完全集中的设置中,因为这种方法依赖于全州行动空间的全球对称性,而且可以导致跨代理商之间的对应。在合作性多试剂系统中,在代理商的不同配置及其当地观测之间出现复杂的对称性。例如,考虑一组导航代理商:在最佳联合政策调整中旋转国家的全球结果;现有单一代理商强化学习中的对称性工作只能推广到完全集中的设置中,因为这种方法依赖于全州行动空间的全球对称性,而且这些对称性可以产生跨代理商的对应性。为了在仍然允许分散执行的情况下对此类对称进行编码,我们建议了一个使全球对称性对本地转型进行分解的因子化。我们提议的因子化可以分配计算方法,用以执行全球对本地代理商和地方互动的对称性。我们引入了基于这一因子化的多代理商异性政策网络,因为这种方法依赖于全州行动空间的全球对称性,并且可能导致跨代理商之间的对应性。我们展示了全球对等性基准性数据分析性政策。我们对等性改进了全球对等性数据分析性。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日