WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.
翻译:WQMIX、 QMIX、 QMIX、 QTRAN 和 VDN 是 Dec- POMDP 的 SOTA 算法。 它们都无法解决复杂的代理商合作领域。 我们给这种问题提供算法。 在第一阶段, 我们解决一个单一代理商问题, 并获得政策。 在第二阶段, 我们用单一代理商政策解决多剂问题。 SA2MA 相对于复杂代理商合作领域的所有竞争者具有明显的优势 。