We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
翻译:我们探索多试剂领域的深度强化学习方法。我们首先分析多试剂案例中传统算法的困难:Q-学习受到环境固有的不静止性的挑战,而政策梯度则因代理人数目的增加而增加差异。然后我们提出一个考虑到其他代理人的行动政策并能够成功学习需要复杂的多试剂协调的政策的行动者-批评方法的调整。此外,我们采用一套培训制度,利用每个代理人的一系列政策,制定更强有力的多试剂政策。我们展示了与现有合作方法和竞争情景相比我们的方法的力度,在合作和竞争情景中,代理群体能够发现各种有形和信息协调战略。