Reinforcement learning (RL) trains many agents, which is resource-intensive and must scale to large GPU clusters. Different RL training algorithms offer different opportunities for distributing and parallelising the computation. Yet, current distributed RL systems tie the definition of RL algorithms to their distributed execution: they hard-code particular distribution strategies and only accelerate specific parts of the computation (e.g. policy network updates) on GPU workers. Fundamentally, current systems lack abstractions that decouple RL algorithms from their execution. We describe MindSpore Reinforcement Learning (MSRL), a distributed RL training system that supports distribution policies that govern how RL training computation is parallelised and distributed on cluster resources, without requiring changes to the algorithm implementation. MSRL introduces the new abstraction of a fragmented dataflow graph, which maps Python functions from an RL algorithm's training loop to parallel computational fragments. Fragments are executed on different devices by translating them to low-level dataflow representations, e.g. computational graphs as supported by deep learning engines, CUDA implementations or multi-threaded CPU processes. We show that MSRL subsumes the distribution strategies of existing systems, while scaling RL training to 64 GPUs.
翻译:强化学习(RL) 培训许多代理商,这是资源密集型的,必须推广到大型 GPU 集群。不同的 RL 培训算法为分配和平行计算提供了不同的机会。然而,目前分布的 RL 系统将RL 算法的定义与分布式执行联系在一起:它们采用硬码特定分发战略,而且只是加快了对 GPU 工人进行计算的具体部分(例如政策网络更新) 。从根本上说,目前的系统缺乏将 RL 算法与执行过程脱钩的抽象性。我们描述了MindSpore加强学习(MSRL),这是一个分布式的RL培训系统,它支持分配政策,指导RL 如何平行计算和在集成资源上分配,而不需要修改算法的实施。 MSR 引入了分散的数据流图的新抽象性图,该图将Python 函数从一个RL 算法的培训循环到平行计算碎片。 在不同装置上执行断块,通过将其转换为低层次的数据流表显示,例如计算图表, 由深学习引擎、 CUDRA 实施或多读的CP 进程显示M 。