We propose using regularization for Multi-Agent Reinforcement Learning rather than learning explicit cooperative structures called {\em Multi-Agent Regularized Q-learning} (MARQ). Many MARL approaches leverage centralized structures in order to exploit global state information or removing communication constraints when the agents act in a decentralized manner. Instead of learning redundant structures which is removed during agent execution, we propose instead to leverage shared experiences of the agents to regularize the individual policies in order to promote structured exploration. We examine several different approaches to how MARQ can either explicitly or implicitly regularize our policies in a multi-agent setting. MARQ aims to address these limitations in the MARL context through applying regularization constraints which can correct bias in off-policy out-of-distribution agent experiences and promote diverse exploration. Our algorithm is evaluated on several benchmark multi-agent environments and we show that MARQ consistently outperforms several baselines and state-of-the-art algorithms; learning in fewer steps and converging to higher returns.
翻译:我们建议利用多机构强化学习的正规化,而不是学习称为`多重机构正规化的Q-学习'(MARQ)的明确合作结构。许多MARL方法利用中央结构,以便在代理机构以分散方式行事时利用全球国家信息或消除通信限制。我们建议,不要学习代理机构执行期间取消的冗余结构,而是利用代理机构的经验来规范个别政策,以便促进结构化的探索。我们研究了多种不同方法,如何在多机构环境下明确或隐含地规范我们的政策。MARQ的目的是通过适用正规化限制来克服MARL背景下的这些限制,这些限制可以纠正非政策性分配代理机构经验中的偏见并促进多样化的探索。我们的算法是在几个基准多机构环境中进行评估的,我们表明MARQ一贯地超越若干基线和最先进的算法;学习较少的步骤和趋同更高的回报。