Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and thread-level synchronization overheads on CPU. In this work, we propose a framework for generating scalable reinforcement learning implementations on multi-core systems. Replay Buffer is a key component of RL algorithms which facilitates storage of samples obtained from environmental interactions and data sampling for the learning process. We define a new data structure for Prioritized Replay Buffer based on $K$-ary sum tree that supports asynchronous parallel insertions, sampling, and priority updates. To address the challenge of irregular memory accesses, we propose a novel data layout to store the nodes of the sum tree that reduces the number of cache misses. Additionally, we propose $\textit{lazy writing}$ mechanism to reduce thread-level synchronization overheads of the Replay Buffer operations. Our framework employs parallel actors to concurrently collect data via environmental interactions, and parallel learners to perform stochastic gradient descent using the collected data. Our framework supports a wide range of reinforcement learning algorithms including DQN, DDPG, etc. We demonstrate the effectiveness of our framework in accelerating RL algorithms by performing experiments on CPU + GPU platform using OpenAI benchmarks.
翻译:强化学习(RL)在机器人、游戏和医疗保健等应用领域取得了巨大成功。然而,培训RL代理机构非常耗时。由于在CPU上存在不规则的内存存存存取和线级同步间接费用等挑战,目前的实施表现不佳。在这项工作中,我们提议了一个框架,用于在多核心系统上产生可扩缩的强化学习实施。重放缓冲是RL算法的关键组成部分,该算法有助于存储从环境互动和数据取样中获得的样本,用于学习过程。我们为优先重置重置缓冲确定了一个新的数据结构,以$K$-ary 和树为基础,支持无同步平行的平行插入、抽样和优先更新。为了应对不规则的内存存存存存存存存存存存存存存取的挑战,我们提议了一个新的数据布局,用于储存可减少缓存误数的总树节点。此外,我们提议 $\ tU{lazy wracy 机制,用于减少重置缓冲缓冲缓冲操作操作操作。我们的框架同时通过环境互动收集数据,同时使用平行的学习者进行Stochchating CLQL QL 实验框架,我们使用加速的Sqlalalalalalexalexexexexexex 。