Recently, model-based agents have achieved better performance compared with model-free ones using the same computational budget and training time in single-agent environments. However, due to the complexity of multi-agent systems, it is very difficult to learn the model of the environment. When model-based methods are applied to multi-agent tasks, the significant compounding error may hinder the learning process. In this paper, we propose an implicit model-based multi-agent reinforcement learning method based on value decomposition methods. Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states, which makes agents have foresight. Our method can be applied to any multi-agent value decomposition method. The experimental results show that our method improves the sample efficiency in partially observable Markov decision process domains.
翻译:最近,与在单一试剂环境中使用相同的计算预算和培训时间的无型试剂相比,基于模型的代理人取得了较好的绩效,然而,由于多试剂系统的复杂性,很难了解环境模型。当以模型为基础的方法应用于多试剂任务时,重大的复合错误可能会阻碍学习过程。在本文件中,我们根据价值分解方法提出了一个基于模型的隐性多试剂强化学习方法。在这种方法下,代理人可以与所学的虚拟环境互动,并根据想象的未来状态评估当前状态值,从而使代理人具有远见。我们的方法可以适用于任何多试剂价值分解法。实验结果显示,我们的方法提高了部分可观测的马尔科夫决定过程区域的样本效率。