In this paper, we propose actor-critic approaches by introducing an actor policy on QMIX [9], which can remove the monotonicity constraint of QMIX and implement a non-monotonic value function factorization for joint action-value. We evaluate our actor-critic methods on StarCraft II micromanagement tasks, and show that it has a stronger performance on maps with heterogeneous agent types.
翻译:在本文件中,我们提出对QMIX [9] 实行一项行为者政策,以此提出行为者-批评办法,该政策可以消除QMIX的单一性制约,并落实联合行动价值的非分子价值函数因子化。 我们评估了我们在StarCraft II微观管理任务上的行为者-批评方法,并表明它在地图上具有更强的性能,具有多种物剂类型。