Value decomposition methods have gradually become popular in the cooperative multi-agent reinforcement learning field. However, almost all value decomposition methods follow the Individual Global Max (IGM) principle or its variants, which restricts the range of issues that value decomposition methods can resolve. Inspired by the notion of dual self-awareness in psychology, we propose a dual self-awareness value decomposition framework that entirely rejects the IGM premise. Each agent consists of an ego policy that carries out actions and an alter ego value function that takes part in credit assignment. The value function factorization can ignore the IGM assumption by using an explicit search procedure. We also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.
翻译:价值分解方法在合作性多试剂强化学习领域逐渐普及,然而,几乎所有价值分解方法都遵循个人全球最大分解(IGM)原则或其变体,该变体限制了分解方法能够解决的问题的范围。在心理学双重自我意识概念的启发下,我们提出了一个双重自我认识价值分解框架,完全否定IGM的前提。每个代体都包含自我政策,开展行动,并发挥替代自我价值功能,参与信用分配。价值函数因子化可以通过使用明确的搜索程序忽视IGM假设。我们还建议建立一个新型的反采掘机制,以避免算法陷入局部最佳状态。作为第一个完全无GM价值分解方法,我们提议的框架在各种合作任务中取得了理想的业绩。