Being able to harness the power of large, static datasets for developing autonomous multi-agent systems could unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed system processes can often be recorded during operation, and large quantities of demonstrative data can be stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective online controllers from static datasets. However, offline MARL is still in its infancy, and, therefore, lacks standardised benchmarks, baselines and evaluation protocols typically found in more mature subfields of RL. This deficiency makes it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing \emph{off-the-grid MARL (OG-MARL)}: a framework for generating offline MARL datasets and algorithms. We release an initial set of datasets and baselines for cooperative offline MARL, created using the framework, along with a standardised evaluation protocol. Our datasets provide settings that are characteristic of real-world systems, including complex dynamics, non-stationarity, partial observability, suboptimality and sparse rewards, and are generated from popular online MARL benchmarks. We hope that OG-MARL will serve the community and help steer progress in offline MARL, while also providing an easy entry point for researchers new to the field.
翻译:许多重要的工业系统都具有多剂性质,很难用模拟器进行模拟。然而,在工业中,分布式系统过程往往可以在运行过程中记录下来,并可以储存大量示范性数据。离线多剂强化学习(MARL)为利用静态数据集建立有效的在线控制器提供了一个有希望的范例。然而,离线多剂强化学习(MARL)仍然处于初级阶段,因此,通常在更成熟的RL子领域发现的标准化基准、基线和评价协议缺乏。这一缺陷使得社区难以明智地衡量进展。在这项工作中,我们的目标是通过释放\emph{off-grid-marL(OG-MARL)来填补这一差距:一个从离线的MARL数据集和算法生成一个框架。我们发布了一套用于合作离线式MARL的初始数据集和基线,该数据集和基准通常在更成熟的RL的子领域找到标准化的基准、基准、基准,使社区很难衡量的进展。我们的数据稳定度和精确度提供了一个真实的在线数据库,而稳定的系统则是从不易变现的模型的模型的模型。