In this paper, I propose actor-critic approaches by introducing an actor policy on QMIX ([1]), which can remove the monotonicity constraint of QMIX and implement a non-monotonic value function factorization for joint action-value.
翻译:在本文件中,我提议采用行为体-批评方法,对QMIX([1])实行行为体政策,以消除QMIX的单一性制约,并实行非分子价值函数因子化,以促进联合行动价值。