Despite the fast development of multi-agent reinforcement learning (MARL) methods, there is a lack of commonly-acknowledged baseline implementation and evaluation platforms. As a result, an urgent need for MARL researchers is to develop an integrated library suite, similar to the role of RLlib in single-agent RL, that delivers reliable MARL implementation and replicable evaluation in various benchmarks. To fill such a research gap, in this paper, we propose Multi-Agent RLlib (MARLlib), a comprehensive MARL algorithm library that facilitates RLlib for solving multi-agent problems. With a novel design of agent-level distributed dataflow, MARLlib manages to unify tens of algorithms, including different types of independent learning, centralized critic, and value decomposition methods; this leads to a highly composable integration of MARL algorithms that are not possible to unify before. Furthermore, MARLlib goes beyond current work by integrating diverse environment interfaces and providing flexible parameter sharing strategies; this allows to create versatile solutions to cooperative, competitive, and mixed tasks with minimal code modifications for end users. A plethora of experiments are conducted to substantiate the correctness of our implementation, based on which we further derive new insights on the relationship between the performance and the design of algorithmic components. With MARLlib, we expect researchers to be able to tackle broader real-world multi-agent problems with trustworthy solutions. Our code\footnote{\url{https://github.com/Replicable-MARL/MARLlib}} and documentation\footnote{\url{https://marllib.readthedocs.io/}} are released for reference.
翻译:尽管多试剂强化学习(MARL)方法得到迅速发展,但缺乏公认的基线执行和评估平台,因此,MARL研究人员迫切需要开发一个综合的图书馆套件,类似于RLlib在单一试剂RL中的作用,提供可靠的MARL执行和在各种基准中复制评价;为了填补这种研究差距,我们在本文件中提议多Agent RLlib(MARLlib)(MARLlib)(MARLlib)(一个全面的MARL算法图书馆),该图书馆便利了RLlib解决多试问题。随着代理级别分布数据流的新设计,MARLlib管理了数十种算法的统一,包括不同类型的独立学习、集中批评和价值分解方法;这导致MARL算法的高度兼容性整合,而这种整合以前不可能统一。此外,MARLlib(Mallib)超越了目前的工作,整合了各种不同的环境接口,提供了灵活的参数共享战略;这可以创造多种解决方案,用于合作、竞争性和混合的任务,最终用户的编码修改程度极小。