Centralized Training for Decentralized Execution, where agents are trained offline using centralized information but execute in a decentralized manner online, has gained popularity in the multi-agent reinforcement learning community. In particular, actor-critic methods with a centralized critic and decentralized actors are a common instance of this idea. However, the implications of using a centralized critic in this context are not fully discussed and understood even though it is the standard choice of many algorithms. We therefore formally analyze centralized and decentralized critic approaches, providing a deeper understanding of the implications of critic choice. Because our theory makes unrealistic assumptions, we also empirically compare the centralized and decentralized critic methods over a wide set of environments to validate our theories and to provide practical advice. We show that there exist misconceptions regarding centralized critics in the current literature and show that the centralized critic design is not strictly beneficial, but rather both centralized and decentralized critics have different pros and cons that should be taken into account by algorithm designers.
翻译:中央集权化执行培训是利用集中信息进行离线培训的,但在线执行是分散式的,在多试剂强化学习界中受到欢迎,特别是由中央评论家和分散式行为者组成的行为者批评方法是这种想法的一个常见例子,然而,在这方面使用中央化评论家的影响没有得到充分的讨论和理解,尽管这是许多算法的标准选择。因此,我们正式分析中央化和分散式评论家的方法,更深入地了解批评者选择的影响。由于我们的理论提出了不现实的假设,我们还将中央化和分散式评论家方法与广泛的环境进行了实验性比较,以验证我们的理论并提供实用的建议。我们表明,现有文献中对中央化批评家设计存在误解,表明中央化评论家的设计并非完全有益,而是中央化和分散式评论家都有不同的利弊,应由算法设计者加以考虑。