Competitive Self-Play (CSP) based Multi-Agent Reinforcement Learning (MARL) has shown phenomenal breakthroughs recently. Strong AIs are achieved for several benchmarks, including Dota 2, Glory of Kings, Quake III, StarCraft II, to name a few. Despite the success, the MARL training is extremely data thirsty, requiring typically billions of (if not trillions of) frames be seen from the environment during training in order for learning a high performance agent. This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems. To address this issue, in this manuscript we describe a framework, referred to as TLeague, that aims at large-scale training and implements several main-stream CSP-MARL algorithms. The training can be deployed in either a single machine or a cluster of hybrid machines (CPUs and GPUs), where the standard Kubernetes is supported in a cloud native manner. TLeague achieves a high throughput and a reasonable scale-up when performing distributed training. Thanks to the modular design, it is also easy to extend for solving other multi-agent problems or implementing and verifying MARL algorithms. We present experiments over StarCraft II, ViZDoom and Pommerman to show the efficiency and effectiveness of TLeague. The code is open-sourced and available at https://github.com/tencent-ailab/tleague_projpage
翻译:以竞争为主的多机构强化学习(MARL)最近显示出惊人的突破。 在许多基准上, 包括Dota 2, Glory of Kings, Glory of Kings, Quake III, StarCraft II,等等, 都取得了强有力的人工智能。 尽管取得了成功, MARL培训极缺乏数据, 通常需要在培训期间从环境中看到数十亿个(如果不是万亿)框架, 以便学习高性能剂。 这给研究人员或工程师带来了非三角性的困难, 并阻止将MARL应用于更广泛的现实世界问题。 要解决这个问题, 我们在此手稿中描述一个称为TLeague的框架, 目的是进行大规模培训, 并采用一些主要的 CSP- MARft 算法。 培训可以部署在一个单一的机器或一组混合机器( CPUs和GPUs), 其中标准库伯涅茨以开放的本地方式支持。 Tleague 实现了高额的透射法, 和合理规模的扩展, 用于分发培训。 由于实施了模块化和多级的模型设计, 我们展示了MARLIL的系统, 展示了目前, 我们展示了MIL 和多级的系统, 展示了目前, 展示了MRAL 和ML 展示了多级的系统, 我们展示了多级的系统 展示了目前, 展示了多级码。