Researchers and practitioners in the field of reinforcement learning (RL) frequently leverage parallel computation, which has led to a plethora of new algorithms and systems in the last few years. In this paper, we re-examine the challenges posed by distributed RL and try to view it through the lens of an old idea: distributed dataflow. We show that viewing RL as a dataflow problem leads to highly composable and performant implementations. We propose AnonFlow, a hybrid actor-dataflow programming model for distributed RL, and validate its practicality by porting the full suite of algorithms in AnonLib, a widely-adopted distributed RL library.
翻译:强化学习领域的研究人员和从业者经常利用平行计算,这在过去几年中导致了大量新的算法和系统。在本文中,我们重新审查了分布式算法和系统带来的挑战,并试图从一个老概念的透镜中看到:分布式数据流。我们显示,将RL视为数据流问题会导致高度可编集和表演者执行。我们提议了AonFlow,这是传播式RL的混合行为者-数据流编程模型,并通过将分布式RLib的全套算法移植到一个广为流行的分布式RLib图书馆来验证其实用性。