Centralized Training with Decentralized Execution (CTDE) has been a very popular paradigm for multi-agent reinforcement learning. One of its main features is making full use of the global information to learn a better joint $Q$-function or centralized critic. In this paper, we in turn explore how to leverage the global information to directly learn a better individual $Q$-function or individual actor. We find that applying the same global information to all agents indiscriminately is not enough for good performance, and thus propose to specify the global information for each agent to obtain agent-specific global information for better performance. Furthermore, we distill such agent-specific global information into the agent's local information, which is used during decentralized execution without too much performance degradation. We call this new paradigm Personalized Training with Distillated Execution (PTDE). PTDE can be easily combined with many state-of-the-art algorithms to further improve their performance, which is verified in both SMAC and Google Research Football scenarios.
翻译:多试剂强化学习非常受欢迎的模式是集中化的分散化执行培训(CTDE),其主要特征之一是充分利用全球信息学习更好的联合美元功能或集中化评论家。在本文件中,我们反过来探索如何利用全球信息直接学习更好的个人Q美元功能或个人行为方。我们发现,不分青红皂白地将同样的全球信息应用到所有行为方都不足以取得良好的业绩,因此建议指定每个行为方获得特定代理人的全球信息,以便取得更好的业绩。此外,我们将这些特定代理人的全球信息输入代理方的地方信息中,该信息在分散化执行期间使用,而没有太多的性能退化。我们称之为这种通过蒸馏化执行进行个性化培训的新模式。PTDE可以很容易地与许多最先进的算法结合起来,以进一步提高其业绩,这一点在SMAC和Google研究足球情景中都得到验证。