In recent years, researchers have achieved great success in applying Deep Reinforcement Learning (DRL) algorithms to Real-time Strategy (RTS) games, creating strong autonomous agents that could defeat professional players in StarCraft~II. However, existing approaches to tackle full games have high computational costs, usually requiring the use of thousands of GPUs and CPUs for weeks. This paper has two main contributions to address this issue: 1) We introduce Gym-$\mu$RTS (pronounced "gym-micro-RTS") as a fast-to-run RL environment for full-game RTS research and 2) we present a collection of techniques to scale DRL to play full-game $\mu$RTS as well as ablation studies to demonstrate their empirical importance. Our best-trained bot can defeat every $\mu$RTS bot we tested from the past $\mu$RTS competitions when working in a single-map setting, resulting in a state-of-the-art DRL agent while only taking about 60 hours of training using a single machine (one GPU, three vCPU, 16GB RAM). See the blog post at https://wandb.ai/vwxyzjn/gym-microrts-paper/reports/Gym-RTS-Toward-Affordable-Deep-Reinforcement-Learning-Research-in-Real-Time-Strategy-Games--Vmlldzo2MDIzMTg and the source code at https://github.com/vwxyzjn/gym-microrts-paper
翻译:近年来,研究人员在将深强化学习算法(DRL)应用到实时战略(RTS)游戏方面取得了巨大成功,创建了强大的自主代理器,可以击败StarCraft~II中的专业球员。然而,现有的全面游戏方法具有很高的计算成本,通常需要使用数千个GPU和CPU几个星期。本文有两个主要贡献来解决这个问题:1)我们引入Gym-$mu$RTS(宣传的“Gym-miro-RTS”)作为快速运行RL环境,用于全场RTS研究,2)我们展示了一套技术,可以将DRL用于全场比赛中的球员击败。我们最训练有素的机器人可以击败我们在过去的$\mu$RTS bot比赛中测试过的每一场$\muw$RTS比赛,结果是一个快速的RDRLARRR,而仅利用一台机器进行大约60小时的培训(一个GPO、三个GMMRIS/MRAB)。