State uncertainty poses a major challenge for decentralized coordination, where multiple agents act according to noisy observations without any access to other agents' information. Centralized training for decentralized execution (CTDE) is a multi-agent reinforcement learning paradigm, which exploits global information to learn a centralized value function in order to derive coordinated multi-agent policies. State-based CTDE has become popular in multi-agent research due to remarkable progress in the StarCraft Multi-Agent Challenge (SMAC). However, SMAC scenarios are less suited for evaluation against state uncertainty due to deterministic observations and low variance in initial states. Furthermore, state-based CTDE largely neglects state uncertainty w.r.t. decentralization of agents and observations thus being possibly less effective in more general settings. In this paper, we address state uncertainty and introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states to provide a more general and configurable benchmark. We then propose Attention-based Embeddings of Recurrency In multi-Agent Learning (AERIAL) to approximate value functions under consideration of state uncertainty. AERIAL replaces the true state in CTDE with the memory representation of all agents' recurrent functions, which are processed by a self-attention mechanism. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. We also evaluate the robustness of AERIAL and state-based CTDE against various configurations of state uncertainty in MessySMAC.
翻译:国家不确定性对分散协调构成重大挑战,因为多代理人根据噪音观测采取行动,而没有获得其他代理人的信息; 分散执行的集中培训是一个多代理人强化学习模式,它利用全球信息学习集中价值功能,以获得协调的多代理人政策; 以国家为基础的CTDE在多代理人研究中越来越受欢迎,因为StarCraft多代理人挑战(SMAC)取得了显著进展; 然而,由于确定性观测和最初各州差异低,SMAC的情景不太适合评价国家不确定性。 此外,基于国家的CTDE基本上忽视了国家的不确定性(t.r.t.),因此在更普遍的环境下,对代理人和观察的权力下放可能不太有效。 在本文件中,我们处理国家不确定性问题,并介绍MessySMAC,一个经过修改的SMAC的版本,该版本具有质疑性的观察,最初各州差异更大。 然而,我们提议在多代理人学习(AERIAL)中,根据基于关注的注意力,对基于国家不确定性的不确定性的不确定性进行估算。