Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents while maintaining high training throughput by running both the environment and reinforcement learning (RL) agents on the GPU. High performance multi-agent environments at this scale have the potential to enable the learning of robust and flexible policies for use in ABMs and simulations of complex systems. We demonstrate training performance with two newly developed, large scale multi-agent training environments. Moreover, we show that these environments can train shared RL policies on time-scales of minutes and hours.
翻译:多剂强化学习实验和开放源培训环境通常规模有限,支持数十个或有时多达数百个互动代理商。本文展示了使用Vogue这一基于高性能代理商的模型框架。Vogue是一个多剂培训环境,支持数千至数万个互动代理商,同时通过在GPU上运行环境和强化学习代理商来保持高水平的培训吞吐量。这一规模的高性能多剂环境具有潜力,能够学习在反弹道导弹中使用的稳健和灵活政策,模拟复杂的系统。我们展示了两个新开发的大型多剂培训环境的培训绩效。此外,我们展示了这些环境可以在时间和小时的尺度上培训共同的RL政策。