With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and Horovod (up to 2.34X) support in training throughput on the latest DGX-A100 platform. Our work provides an initial user experience with GPU spatial multiplexing in processing heterogeneous workloads with a mixture of computation and communication.
翻译:随着机器人在工业控制和自主驱动中越来越受欢迎,深入强化学习(DRL)引起各个领域的注意,然而,由于现代强大的GPU平台的工作量不同,执行模式相互脱节,DRL的计算仍然效率低下。为此,我们提议GMI-DRL,这是通过GPU空间多路转换加速多GPU DRL的系统设计。我们引入了一种资源可调适的GPU多重驱动实例的新设计,以适应DL任务的实际需要,一项适应性GMI管理战略,以同时实现高GPU利用率和计算流程的高效的GPU,以及高效的GMI通信支持,以满足DRL各种通信模式的需求。全面实验显示,GMI-DRL比NVIDIA Isaac Gym与NCCL(最高为2.81-X)和Horovod(最高为2.34X)的状态,在培训最新DGX-A100平台的吞吐量方面提供了培训。我们的工作为GPU空间多路处理混合计算和混合计算工作量提供了初步的用户经验。