There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others aim to improve the system's overall throughput. In this paper, we try to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop, and a modest workstation, to a high-end machine like NVIDIA DGX-A100. On a high-end machine, EnvPool achieves 1 million frames per second for the environment execution on Atari environments and 3 million frames per second on MuJoCo environments. When running on a laptop, the speed of EnvPool is 2.8 times of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has the great potential to become the de facto RL environment execution engine. Example runs show that it takes only 5 minutes to train Atari Pong and MuJoCo Ant, both on a laptop. EnvPool has already been open-sourced at https://github.com/sail-sg/envpool.
翻译:在开发强化学习(RL)培训系统方面已经取得了显著进展。 诸如IMALA、 Apex、 Seed RL、 样板工厂等过去的工作旨在改进系统的总体输送量。 在本文中, 我们试图解决RL培训系统中常见的瓶颈问题, 即平行环境执行, 这往往是整个系统最慢的部分, 但很少引起注意。 在为平行 RL 环境设计了同步设计后, 我们改进了 RL 环境模拟速度, 从膝上型计算机、 小型工作站到 NVIDIA DGX- A100 等高端机器。 在高端的机器上, Envpool 每秒为在Atari环境执行环境执行100万个框架, 每秒300万个框架。 当运行膝上电脑时, EnvPool 的速度是 Python 子进程2.8倍。 此外, 与现有的 RL 培训图书馆的高度兼容性在开放式社区中表现出来, 包括 Clean- L、 reval 和 Deminal 机在最后的运行中让Geal 能够快速的 REVER 显示其极机 和 Enal- cow 。