Recently, some challenging tasks in multi-agent systems have been solved by some hierarchical reinforcement learning methods. Inspired by the intra-level and inter-level coordination in the human nervous system, we propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for fully cooperative multi-agent problems. To address the instability arising from the concurrent optimization of policies between various levels and agents, we introduce the dual coordination mechanism of inter-level and inter-agent strategies by designing reward functions in a two-level hierarchy. HAVEN does not require domain knowledge and pre-training, and can be applied to any value decomposition variant. Our method achieves desirable results on different decentralized partially observable Markov decision process domains and outperforms other popular multi-agent hierarchical reinforcement learning algorithms.
翻译:最近,多试剂系统的一些挑战性任务已经通过一些等级强化学习方法得到解决。在人类神经系统内部和不同级别之间协调的启发下,我们提出一个新的价值分解框架,其基础是进行等级强化学习,以全面合作多剂问题为基础。为了解决各级和不同代理之间同时优化政策所造成的不稳定,我们通过在两级结构中设计奖励职能,引入了级别间和跨机构战略的双重协调机制。HANDN不需要域内知识和培训前,可以应用到任何价值分解变异。我们的方法在不同分散的部分可观测到的马尔科夫决策领域取得了可取的结果,并超越了其他受欢迎的多剂强化学习算法。