Despite their potential in real-world applications, multi-agent reinforcement learning (MARL) algorithms often suffer from high sample complexity. To address this issue, we present a novel model-based MARL algorithm, BiLL (Bi-Level Latent Variable Model-based Learning), that learns a bi-level latent variable model from high-dimensional inputs. At the top level, the model learns latent representations of the global state, which encode global information relevant to behavior learning. At the bottom level, it learns latent representations for each agent, given the global latent representations from the top level. The model generates latent trajectories to use for policy learning. We evaluate our algorithm on complex multi-agent tasks in the challenging SMAC and Flatland environments. Our algorithm outperforms state-of-the-art model-free and model-based baselines in sample efficiency, including on two extremely challenging Super Hard SMAC maps.
翻译:尽管多智能体强化学习(MARL)算法在实际应用中具有潜在的潜能,但它们经常受到高样本复杂性的影响。为了解决这一问题,我们提出了一种新颖的基于模型的MARL算法BiLL (Bi-Level Latent Variable Model-based Learning),它可以从高维输入中学习一个双层潜在变量模型。在顶层,该模型学习全局状态的潜在表示,它编码与行为学习相关的全局信息。在底层,由于给定了来自顶层的全局潜在表示,它为每个代理学习了潜在表示。该模型生成潜在轨迹以用于策略学习。我们在具有挑战性的SMAC和Flatland环境中评估了我们的算法,其中包括复杂的多智能体任务。我们的算法在样本效率方面优于最先进的无模型和基于模型的基线,包括两个极具挑战性的Super Hard SMAC地图。