Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow. We show that viewing RL as a dataflow problem leads to highly composable and performant implementations. We propose RLlib Flow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in RLlib, a widely adopted distributed RL library. Concretely, RLlib Flow provides 2-9 code savings in real production code and enables the composition of multi-agent algorithms not possible by end users before. The open-source code is available as part of RLlib at https://github.com/ray-project/ray/tree/master/rllib.
翻译:强化学习领域的研究人员和从业者经常利用平行计算,这在过去几年中导致了大量新的算法和系统。在本文中,我们重新审查了分布式的算法和系统带来的挑战,并试图从一个老想法的透镜中看到:分布式数据流。我们显示,将RL视为数据流问题会导致高度可作成和表演者执行。我们提议了RLlib Flow,这是分布式RLL的混合行为者-数据流编程模型,并通过将全套算法套装放在一个广泛采用的RLLlib图书馆,验证其实用性。具体地说,RLlib Flow在实际生产代码中节省了2-9条代码,使终端用户以前不可能做到的多代理人算法组成。开放源代码作为RLlib的一部分可在https://github.com/ray-project/ray/tree/master/rlib上查阅。