Deep reinforcement learning (DRL) has revolutionized learning and actuation in applications such as game playing and robotic control. The cost of data collection, i.e., generating transitions from agent-environment interactions, remains a major challenge for wider DRL adoption in complex real-world problems. Following a cloud-native paradigm to train DRL agents on a GPU cloud platform is a promising solution. In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels. At a high-level, ElegantRL-podracer employs a tournament-based ensemble scheme to orchestrate the training process on hundreds or even thousands of GPUs, scheduling the interactions between a leaderboard and a training pool with hundreds of pods. At a low-level, each pod simulates agent-environment interactions in parallel by fully utilizing nearly 7,000 GPU CUDA cores in a single GPU. Our ElegantRL-podracer library features high scalability, elasticity and accessibility by following the development principles of containerization, microservices and MLOps. Using an NVIDIA DGX SuperPOD cloud, we conduct extensive experiments on various tasks in locomotion and stock trading and show that ElegantRL-podracer substantially outperforms RLlib. Our codes are available on GitHub.
翻译:深度强化学习( DRL) 使游戏游戏和机器人控制等应用中的学习和动作发生革命性变革, 使学习和动作在游戏游戏和机器人控制等应用中发生革命性。 数据收集的成本, 也就是从代理- 环境互动中产生过渡, 仍然是在复杂的现实世界问题中更广泛地采用 DRL 的一大挑战。 在对GPU云平台上DL 代理进行培训的云性模式之后, 将DRL 设置在云层平台上, 是一个有希望的解决办法。 在本文中, 我们展示了一个可扩缩和弹性的图书馆图书馆, 用于进行云层深度强化学习, 有效地支持数百万 GPU核心在多个级别上进行大规模平行培训。 在高层次上, ElgantR- Podracracer 使用一个基于竞合会的游戏游戏游戏组合计划, 以协调数百甚至数千个GPUL的训练进程, 将领导板与数百个舱的训练库室进行互动。 在低层次上, 充分利用近7000 GUCU CUA核心, 我们的高级R- Podracreal DL 在可获取的SUDADL 数据库中, 上, 的高级操作操作中, 展示一个高透明化, 和多的系统化的系统化的系统化, 。