Recently, model-based agents have achieved better performance than model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is tough to learn the model of the environment. The significant compounding error may hinder the learning process when model-based methods are applied to multi-agent tasks. This paper proposes an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states in the latent space, making agents have the foresight. Our approach can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in different partially observable Markov decision process domains.
翻译:最近,在单一试剂环境中使用相同的计算预算和培训时间,基于模型的代理人比不使用模型的代理人取得了更好的业绩;然而,由于多试剂系统的复杂性,很难了解环境模型。在将基于模型的方法应用于多试剂任务时,重大的复合错误可能会妨碍学习过程。本文件根据价值分解方法提出了基于模型的隐性多试剂强化学习方法。在这种方法下,代理人可以与所学的虚拟环境互动,并根据潜在空间的想象未来状态评估当前状态值,使代理人具有远见。我们的方法可以适用于任何多试剂的分解方法。实验结果显示,我们的方法提高了不同部分可观测的马尔科夫决定过程区域的样本效率。